Claude Elwood Shannon
Miscellaneous Writings
Edited by
N. J. A. Sloane
Aaron D. Wyner
Back in 1993, the late Aaron Wyner and I edited Claude Elwood Shannon's
papers, and most of them appeared in a volume (Claude Elwood
Shannon's Collected Papers) which was published by the IEEE Press.
However, there were a number of items written by Shannon of lesser
interest which we did not include (some declassified wartime memoranda,
obscure AT&T Bell Labs memos, some mimeographed MIT lecture notes, etc.).
These we put into a binder, held together by an Acco metal strip.
We made half a dozen copies, and gave copies to the Library
of Congress, the British Library, the Bell Laboratories Library,
the MIT Library, to Claude Shannon himself, and to one or two other places.
Over the years many people have asked me if it was possible to get access
to this collection.
I had now had this volume scanned and converted to pdf files.
The total size of the files is about 450 megabytes.
Neil J. A. Sloane, October 13, 2013
Mathematical Sciences Research Center, AT&T Bell Laboratories, Murray Hill,
New Jersey 07974
CONTENTS
File 1 : Front matter
This volume contains the following items. Bracketed numbers refer to the bibliography.
"The Use of the Lakatos-Hickman Relay in a Subscriber Sender," Memorandum
MM 40-130-179, August 3, 1940, Bell Laboratories, 7 pp. + 8 figs.
"A Study of the Deflection Mechanism and Some Results on Rate Finders,"
Report to National Defense Research Committee, Div. 7-311 -Ml, circa April,
1941,37 pp. + 15 figs.
"A Height Data Smoothing Mechanism," Report to National Defense Research
Committee, Div. 7-313.2-M1, Princeton Univ., May 26, 1941, 9 pp. + 9 figs.
"Some Experimental Results on the Deflection Mechanism," Report to National
Defense Research Committee, Div. 7-31 1-M1, June 26, 1941, 11 pp.
"Criteria for Consistency and Uniqueness in Relay Circuits," Typescript, Sept. 8,
1941,5 pp. + 3 figs.
(With W. Feller) "On the Integration of the Ballistic Equations on the Aberdeen
Analyzer," Applied Mathematics Panel Report No. 28.1, National Defense
Research Committee, July 15, 1943, 9 pp.
"Two New Circuits for Alternate Pulse Counting," Typescript, May 29, 1944,
Bell Laboratories, 2 pp. + 3 Figs.
(Note that many of these files contain more than one document.)
File
5:
[5]
File
7:
[7]
File
9:
[9]
File
11:
[11]
File
12:
[12]
File
16:
[16]
File
16:
[19]
File
16:
File
21:
File
21:
File
24:
File
26:
File
27:
File
30:
File
31:
File
31:
File
31:
File
31:
File
36:
File
36:
File
46:
File
46:
File
46:
[20] "Counting Up or Down With Pulse Counters," Typescript, May 31, 1944, Bell
Laboratories, 1 p. + 1 fig.
[21] (With B. M. Oliver) "Circuits for a P.C.M. Transmitter and Receiver,"
Memorandum MM 44-1 10-37, June 1, 1944, Bell Laboratories, 4 pp., 1 1 figs.
[23] "Pulse Shape to Minimize Bandwidth With Nonoverlapping Pulses," Typescript,
August 4, 1944, Bell Laboratories, 4 pp.
[24] "A Mathematical Theory of Cryptography," Memorandum MM 45-1 10-02, Sept.
1, 1945, Bell Laboratories, 1 14 pp. + 25 figs.
[26] "Mixed Statistical Determinate Systems," Typescript, Sept. 19, 1945, Bell
Laboratories, 17 pp.
[27] (With R. B. Blackman and H. W. Bode) "Data Smoothing and Prediction in
Fire-Control Systems," Summary Technical Report, Div. 7, National Defense
Research Committee, Vol. 1, Gunfire Control, Washington, DC, 1946, pp. 71-159
and 166-167. AD 200795. Also in National Military Establishment Research and
Development Board, Report #13 MGC 12/1, August 15, 1948. Superseded by
[51] and by R. B. Blackman, Linear Data-Smoothing and Prediction in Theory
and Practice, Addison-Wesley, Reading, Mass., 1965.
[30] (With C. L. Dolph) "The Transient Behavior of a Large Number of Four-
Terminal Unilateral Linear Networks Connected in Tandem," Memorandum MM
46-1 10-49, April 10, 1946, Bell Laboratories, 34 pp. + 16 figs.
[31] "Electronic Methods in Telephone Switching," Typescript, October 17, 1946,
Bell Laboratories, 5 pp. + 1 fig.
[32] "Some Generalizations of the Sampling Theorem," Typescript, March 4, 1948, 5
pp. + 1 fig.
[34] "The Normal Ergodic Ensembles of Functions," Typescript, March 15, 1948, 5
PP-
[35] "Systems Which Approach the Ideal as P/N — > «>," Typescript, March 15,
1948, 2 pp.
[36] "Theorems on Statistical Sequences," Typescript, March 15, 1948, 8 pp.
[45] "Significance and Application [of Communication Research]," Symposium on
Communication Research, 11-13 October, 1948, Research and Development
Board, Department of Defense, Washington, DC, pp. 14-23, 1948.
[46] "Note on Certain Transcendental Numbers," Typescript, October 27, 1948, Bell
Laboratories, 1 p.
[47] "A Case of Efficient Coding for a Very Noisy Channel," Typescript, Nov. 18,
1948, Bell Laboratories, 2 pp.
[48] "Note on Reversing a Discrete Markhoff Process," Typescript, Dec. 6 1948, Bell
Laboratories, 2 pp. + 2 Figs.
Pi
n
Fi
■
le
46:
Fi
le
59:
Fi
le
59:
Fi
le
59:
Fi
le
59:
Fi
le
78:
Fi
le
78:
Fi
le
78:
Fi
le
78:
Fi
le
78:
File 104
[49] "Information Theory," Typescript of abstract of talk for American Statistical
Society, 1949, 5 pp.
[58] "Proof of an Integration Formula,'* Typescript, circa 1950, Bell Laboratories, 2
pp.
[59] "A Digital Method of Transmitting Information," Typescript, no date, circa
1950, Bell Laboratories, 3 pp.
[72] * 'Creative Thinking,' ' Typescript, March 20, 1952, Bell Laboratories, 10 pp.
[74] (With E. F. Moore) "The Relay Circuit Analyzer,*' Memorandum MM 53-1400-
9, March 31, 1953, Bell Laboratories, 14 pp. + 4 figs.
[77] "Throbac - Circuit Operation," Typescript, April 9, 1953, Bell Laboratories, 7
pp.
[78] ' 'Tower of Hanoi,' ' Typescript, April 20, 1953, Bell Laboratories, 4 pp.
[81] "Mathmanship or How to Give an Explicit Solution Without Actually Solving
the Problem," Typescript, June 3, 1953, Bell Laboratories, 2 pp.
[84] (With E. F. Moore) "The Relay Circuit Synthesizer," Memorandum MM 53-
140-52, November 30, 1953, Bell Laboratories, 22 pp. + 5 figs.
[87] "Bounds on the Derivatives and Rise Time of a Band and Amplitude Limited
Signal," Typescript, April 8, 1954, Bell Laboratories, 6 pp. + 1 Fig.
[95] "Concavity of Transmission Rate as a Function of Input Probabilities,"
Memorandum MM 55-1 14-28, June 8, 1955, Bell Laboratories.
[104] "Information Theory," Seminar Notes, Massachusetts Institute of Technology,
1956 and succeeding years. Contains the following sections:
"A skeleton key to the information theory notes," 3 pp. "Bounds on the tails of
martingales and related questions," 19 pp. "Some useful inequalities for
distribution functions," 3 pp. "A lower bound on the tail of a distribution," 9
pp. "A combinatorial theorem," 1 p. "Some results on determinants," 3 pp.
"Upper and lower bounds for powers of a matrix with non-negative elements," 3
pp. "The number of sequences of a given length," 3 pp. "Characteristic for a
language with independent letters/' 4 pp. "The probability of error in optimal
codes," 5 pp. "Zero error codes and the zero error capacity Co," 10 pp.
"Lower bound for Pef for a completely connected channel with feedback," 1 p.
"A lower bound for P€ when R > C," 2 pp. "A lower bound for Pe," 2 pp.
"Lower bound with one type of input and many types of output," 3 pp.
"Application of 'sphere-packing' bounds to feedback case," 8 pp. "A result for
the memory less feedback channel," 1 p. "Continuity of Pe opt as a function of
transition probabilities," 1 p. "Codes of a fixed composition," 1 p. "Relation of
Pe to p," 2 pp. "Bound on Pe for random ode by simple threshold argument," 4
pp. "A bound on P€ for a random code," 3 pp. "The Feinstein bound," 2 pp.
"Relations between probability and minimum word separation," 4 pp.
File 104
File
105
[105]
File
105
[106]
File
105
■ [107]
File
105
[108]
File
105
[124]
File
105
; [127]
"Inequalities for decodable codes," 3 pp. "Convexity of channel capacity as a
function of transition probabilities," 1 pp. "A geometric interpretation of
channel capacity," 6 pp, "Log moment generating function for the square of a
Gaussian variate," 2 pp. "Upper bound on Pe for Gaussian channel by
expurgated random code," 2 pp. "Lower bound on Pe in Gaussian channel by
minimum distance argument," 2 pp, "The sphere packing bound for the
Gaussian power limited channel," 4 pp. "The r-terminal channel," 7 pp.
"Conditions for constant mutual information," 2 pp, "The central limit theorem
with large deviations," 6 pp. "The Chemoff inequality," 2 pp. "Upper and
lower bounds on the tails of distributions," 4 pp. "Asymptotic behavior of the
distribution function," 5 pp. "Generalized Chebyeheff and Chernoff
inequalities," I p. "Channels with side information at the transmitter," 13 pp.
"Some miscellaneous results in coding theory," 15 pp. "Error probability
bounds for noisy channels," 20 pp.
"Reliable Machines from Unreliable Components," notes of five lectures,
Massachusetts Institute of Technology, Spring 1956, 24 pp.
"The Portfolio Problem, and How to Pay the Forecaster," lecture notes taken by
W. W, Peterson, Massachusetts Institute of Technology, Spring, 1956, 8 pp.
"Notes on Relation of Error Probability to Delay in a Noisy Channel," notes of a
lecture, Massachusetts Institute of Technology, Aug. 30, 1956, 3 pp.
"Notes on the Kelly Betting Theory of Noisy Information," notes of a lecture,
Massachusetts Institute of Technology, Aug. 31, 1956, 2 pp.
"The Fourth- Dimensional Twist, or a Modest Proposal in Aid of the American
Driver in England," typescript, All Souls College, Oxford, Trinity term, 1978, 7
pp. + 8 figs.
"A Rubric on Rubik Cubics," Typescript, circa 1982, 6 pp.
Claude Elwood Shannon
Miscellaneous Writings
Edited by
N. J. A. Sloane
Aaron D. Wyner
Mathematical Sciences Research Center, AT&T Bell Laboratories, Murray Hill,
New Jersey 07974
Preface
This volume contains all of Claude Elwood Shannon's writings that we did not include in
his Collected Papers. *
* Claude Elwood Shannon: Collected Papers, edited by N. J. A. Sloane and A. D. Wyner, IEEE Press,
New York, 1993, xliv + 924 pp. ISBN 0-7803-0434-9.
Contents
Photograph of Claude Shannon at Bell Labs in May 1952. Caption: "In 1952, Claude E.
Shannon of Bell Laboratories devised an experiment to illustrate the capabilities of
telephone relays. Here, an electrical mouse finds its way unerringly through a maze,
guided by information remembered in the kind of switching relays used in dial telephone
systems. Experiments with the mouse helped stimulate Bell Laboratories researchers to
think of new ways to use the logical powers of computers for operations other than
numerical calculation."
Photograph of Claude Shannon and Dave Hagelbarger at Bell Labs in March 1955.
Caption: "Claude Shannon, the originator of Information Theory, at the board and Dave
Hagelbarger work out some equations needed. Their current projects include work on
automata-advanced type of computing machines which are able to perform various
thought functions.
Photograph of Claude Shannon taken in 1980's. Photographer unknown.
Preface
Bibliography of Claude Elwood Shannon. Comments such as "Included in Part B" refer
to Parts A, B, C, D of the Collected Papers mentioned in the Preface.
This volume contains the following items. Bracketed numbers refer to the bibliography.
[5] 4 The Use of the Lakatos-Hickman Relay in a Subscriber Sender," Memorandum
MM 40-130-179, August 3, 1940, Bell Laboratories, 7 pp. + 8 figs.
[7] "A Study of the Deflection Mechanism and Some Results on Rate Finders,"
Report to National Defense Research Committee, Div. 7-31 1-M1, circa April,
1941,37 pp. + 15 figs.
[9] "A Height Data Smoothing Mechanism," Report to National Defense Research
Committee, Div. 7-313.2-M1, Princeton Univ., May 26, 1941, 9 pp. + 9 figs.
[11] "Some Experimental Results on the Deflection Mechanism," Report to National
Defense Research Committee, Div. 7-31 1 -Ml, June 26, 1941, 1 1 pp.
[12] "Criteria for Consistency and Uniqueness in Relay Circuits," Typescript, Sept. 8,
1941,5 pp. + 3 figs.
[16] (With W. Feller) "On the Integration of the Ballistic Equations on the Aberdeen
Analyzer," Applied Mathematics Panel Report No. 28.1, National Defense
Research Committee, July 15, 1943, 9 pp.
[19] "Two New Circuits for Alternate Pulse Counting," Typescript, May 29, 1944,
Bell Laboratories, 2 pp. + 3 Figs.
-2-
[20] "Counting Up or Down With Pulse Counters," Typescript, May 31, 1944, Bell
Laboratories, 1 p. + 1 fig.
[21] (With B. M. Oliver) "Circuits for a P.C.M. Transmitter and Receiver,"
Memorandum MM 44-1 10-37, June 1, 1944, Bell Laboratories, 4 pp., 1 1 figs.
[23] "Pulse Shape to Minimize Bandwidth With Nonoverlapping Pulses," Typescript,
August 4, 1944, Bell Laboratories, 4 pp.
[24] "A Mathematical Theory of Cryptography," Memorandum MM 45-1 10-02, Sept.
1, 1945, Bell Laboratories, 1 14 pp. + 25 figs.
[26] "Mixed Statistical Determinate Systems," Typescript, Sept. 19, 1945, Bell
Laboratories, 17 pp.
[27] (With R. B. Blackman and H. W. Bode) "Data Smoothing and Prediction in
Fire-Control Systems," Summary Technical Report, Div. 7, National Defense
Research Committee, Vol. 1, Gunfire Control, Washington, DC, 1946, pp. 71-159
and 166-167. AD 200795. Also in National Military Establishment Research and
Development Board, Report #13 MGC 12/1, August 15, 1948. Superseded by
[51] and by R. B. Blackman, Linear Data-Smoothing and Prediction in Theory
and Practice, Addison- Wesley, Reading, Mass., 1965.
[30] (With C. L. Dolph) "The Transient Behavior of a Large Number of Four-
Terminal Unilateral Linear Networks Connected in Tandem," Memorandum MM
46-1 10-49, April 10, 1946, Bell Laboratories, 34 pp. + 16 figs.
[31] "Electronic Methods in Telephone Switching," Typescript, October 17, 1946,
Bell Laboratories, 5 pp. + 1 fig.
[32] "Some Generalizations of the Sampling Theorem," Typescript, March 4, 1948, 5
pp. + 1 fig.
[34] "The Normal Ergodic Ensembles of Functions," Typescript, March 15, 1948, 5
pp.
[35] "Systems Which Approach the Ideal as P/N -> <»," Typescript, March 15,
1948, 2 pp.
[36] "Theorems on Statistical Sequences," Typescript, March 15, 1948, 8 pp.
[45] "Significance and Application [of Communication Research]," Symposium on
Communication Research, 11-13 October, 1948, Research and Development
Board, Department of Defense, Washington, DC, pp. 14-23, 1948.
[46] "Note on Certain Transcendental Numbers," Typescript, October 27, 1948, Bell
Laboratories, 1 p.
[47] "A Case of Efficient Coding for a Very Noisy Channel," Typescript, Nov. 18,
1948, Bell Laboratories, 2 pp.
[48] "Note on Reversing a Discrete Markhoff Process," Typescript, Dec. 6 1948, Bell
Laboratories, 2 pp. + 2 Figs.
-3-
[49] "Information Theory," Typescript of abstract of talk for American Statistical
Society, 1949, 5 pp.
[58] "Proof of an Integration Formula," Typescript, circa 1950, Bell Laboratories, 2
pp.
[59] "A Digital Method of Transmitting Information," Typescript, no date, circa
1950, Bell Laboratories, 3 pp.
[72] ' 'Creative Thinking," Typescript, March 20, 1952, Bell Laboratories, 10 pp.
[74] (With E. F. Moore) "The Relay Circuit Analyzer," Memorandum MM 53-1400-
9, March 31, 1953, Bell Laboratories, 14 pp. + 4 figs.
[77] "Throbac - Circuit Operation," Typescript, April 9, 1953, Bell Laboratories, 7
pp.
[78] "Tower of Hanoi," Typescript, April 20, 1953, Bell Laboratories, 4 pp.
[81] "Mathmanship or How to Give an Explicit Solution Without Actually Solving
the Problem," Typescript, June 3, 1953, Bell Laboratories, 2 pp.
[84] (With E. F. Moore) "The Relay Circuit Synthesizer," Memorandum MM 53-
140-52, November 30, 1953, Bell Laboratories, 22 pp. + 5 figs.
[87] "Bounds on the Derivatives and Rise Time of a Band and Amplitude Limited
Signal," Typescript, April 8, 1954, Bell Laboratories, 6 pp. + 1 Fig.
[95] "Concavity of Transmission Rate as a Function of Input Probabilities,"
Memorandum MM 55-1 14-28, June 8, 1955, Bell Laboratories.
[104] "Information Theory," Seminar Notes, Massachusetts Institute of Technology,
1956 and succeeding years. Contains the following sections:
"A skeleton key to the information theory notes," 3 pp. "Bounds on the tails of
martingales and related questions," 19 pp. "Some useful inequalities for
distribution functions," 3 pp. "A lower bound on the tail of a distribution," 9
pp. "A combinatorial theorem," 1 p. "Some results on determinants," 3 pp.
"Upper and lower bounds for powers of a matrix with non-negative elements," 3
pp. "The number of sequences of a given length," 3 pp. "Characteristic for a
language with independent letters," 4 pp. "The probability of error in optimal
codes," 5 pp. "Zero error codes and the zero error capacity C0," 10 pp.
"Lower bound for Pej for a completely connected channel with feedback," 1 p.
"A lower bound for Pe when R > C," 2 pp. "A lower bound for Pe" 2 pp.
"Lower bound with one type of input and many types of output," 3 pp.
"Application of 'sphere-packing' bounds to feedback case," 8 pp. "A result for
the memoryless feedback channel," 1 p. "Continuity of Pe opt as a function of
transition probabilities," 1 p. "Codes of a fixed composition," 1 p. "Relation of
Pe to p," 2 pp. "Bound on Pe for random ode by simple threshold argument," 4
pp. "A bound on Pe for a random code," 3 pp. "The Feinstein bound," 2 pp.
"Relations between probability and minimum word separation," 4 pp.
-4-
"Inequalities for decodable codes," 3 pp. "Convexity of channel capacity as a
function of transition probabilities," 1 pp. "A geometric interpretation of
channel capacity," 6 pp. "Log moment generating function for the square of a
Gaussian variate," 2 pp. "Upper bound on Pe for Gaussian channel by
expurgated random code," 2 pp. "Lower bound on Pe in Gaussian channel by
minimum distance argument," 2 pp. "The sphere packing bound for the
Gaussian power limited channel," 4 pp. "The jT-terminal channel," 7 pp.
"Conditions for constant mutual information," 2 pp. "The central limit theorem
with large deviations," 6 pp. "The Chernoff inequality," 2 pp. "Upper and
lower bounds on the tails of distributions," 4 pp. "Asymptotic behavior of the
distribution function," 5 pp. "Generalized Chebycheff and Chernoff
inequalities," 1 p. "Channels with side information at the transmitter," 13 pp.
"Some miscellaneous results in coding theory," 15 pp. "Error probability
bounds for noisy channels," 20 pp.
[105] "Reliable Machines from Unreliable Components," notes of five lectures,
Massachusetts Institute of Technology, Spring 1956, 24 pp.
[106] "The Portfolio Problem, and How to Pay the Forecaster," lecture notes taken by
W. W. Peterson, Massachusetts Institute of Technology, Spring, 1956, 8 pp.
[107] "Notes on Relation of Error Probability to Delay in a Noisy Channel," notes of a
lecture, Massachusetts Institute of Technology, Aug. 30, 1956, 3 pp.
[108] "Notes on the Kelly Betting Theory of Noisy Information," notes of a lecture,
Massachusetts Institute of Technology, Aug. 31, 1956, 2 pp.
[124] "The Fourth-Dimensional Twist, or a Modest Proposal in Aid of the American
Driver in England," typescript, All Souls College, Oxford, Trinity term, 1978, 7
pp. + 8 figs.
[127] "A Rubric on Rubik Cubics," Typescript, circa 1982, 6 pp.
Bibliography of Claude Elwood Shannon
"A Symbolic Analysis of Relay and Switching Circuits," Transactions
American Institute of Electrical Engineers, Vol. 57 (1938), pp. 713-723.
(Received March 1, 1938.) Included in Part B.
Letter to Vannevar Bush, Feb. 16, 1939. Printed in F.-W. Hagemeyer,
Die Entstehung von Informationskonzepten in der Nachrichtentechnik:
eine Fallstudie zur Theoriebildung in der Technik in Industrie- und
Kriegsforschung [The Origin of Information Theory Concepts in
Communication Technology: Case Study for Engineering Theory-
Building in Industrial and Military Research], Doctoral Dissertation,
Free Univ. Berlin, Nov. 8, 1979, 570 pp. Included in Part A.
"An Algebra for Theoretical Genetics," Ph.D. Dissertation, Department
of Mathematics, Massachusetts Institute of Technology, April 15, 1940,
69 pp. Included in Part C.
"A Theorem on Color Coding," Memorandum 40-130-153, July 8,
1940, Bell Laboratories. Superseded by "A Theorem on Coloring the
Lines of a Network. ' ' Not included.
"The Use of the Lakatos-Hickman Relay in a Subscriber Sender,"
Memorandum MM 40-130-179, August 3, 1940, Bell Laboratories, 7 pp.
"A Study of the Deflection Mechanism and Some Results on Rate
Finders," Report to National Defense Research Committee, Div. 7-311-
Ml, circa April, 1941, 37 pp. + 15 figs. Included in this volume.
"Backlash in Overdamped Systems," Report to National Defense
Research Committee, Princeton Univ., May 14, 1941, 6 pp. Abstract
only included in Part B.
"A Height Data Smoothing Mechanism," Report to National Defense
Research Committee, Div. 7-313.2-M1, Princeton Univ., May 26, 1941,
9 pp. + 9 figs. Included in this volume.
"The Theory of Linear Differential and Smoothing Operators," Report
to National Defense Research Committee, Div. 7-3 13.1 -Ml, Princeton
Univ., June 8, 1941, 1 1 pp. Not included.
"Some Experimental Results on the Deflection Mechanism," Report to
National Defense Research Committee, Div. 7-3 11 -Ml, June 26, 1941,
1 1 pp. Included in this volume.
B.
[12] "Criteria for Consistency and Uniqueness in Relay Circuits,"
Typescript, Sept. 8, 1941, 5 pp. + 3 figs. Included in this volume.
[13] "The Theory and Design of Linear Differential Equation Machines,"
Report to the Services 20, Div. 7-31 1-M2, Jan. 1942, Bell Laboratories,
73 pp. + 30 figs. Included in Part B.
[14] (With John Riordan) "The Number of Two-Terminal Series-Parallel
Networks," Journal of Mathematics and Physics, Vol. 21 (August,
1942), pp. 83-93. Included in Part B.
[15] "Analogue of the Vernam System for Continuous Time Series,"
Memorandum MM 43-110-44, May 10, 1943, Bell Laboratories, 4 pp. +
4 figs. Included in Part A.
[16] (With W. Feller) "On the Integration of the Ballistic Equations on the
Aberdeen Analyzer," Applied Mathematics Panel Report No. 28.1,
National Defense Research Committee, July 15, 1943, 9 pp. Included in
this volume.
[17] "Pulse Code Modulation," Memorandum MM 43-110-43, December 1,
1943, Bell Laboratories. Not included.
[18] "Feedback Systems with Periodic Loop Closure," Memorandum MM
44-1 10-32, March 16, 1944, Bell Laboratories. Not included.
[19] "Two New Circuits for Alternate Pulse Counting," Typescript, May 29,
1944, Bell Laboratories, 2 pp. + 3 Figs. Included in this volume.
[20] "Counting Up or Down With Pulse Counters," Typescript, May 31,
1944, Bell Laboratories, 1 p. + 1 fig. Included in this volume.
[21] (With B. M. Oliver) "Circuits for a P.C.M. Transmitter and Receiver,"
Memorandum MM 44-1 10-37, June 1, 1944, Bell Laboratories, 4 pp., 1 1
figs. Included in this volume.
[22] "The Best Detection of Pulses," Memorandum MM 44-1 10-28, June 22,
1944, Bell Laboratories, 3 pp. Included in Part A.
[23] "Pulse Shape to Minimize Bandwidth With Nonoverlapping Pulses,"
Typescript, August 4, 1944, Bell Laboratories, 4 pp. Included in this
volume.
[24] "A Mathematical Theory of Cryptography," Memorandum MM 45-
110-02, Sept. 1, 1945, Bell Laboratories, 114 pp. + 25 figs. Superseded
by the following paper. Included in this volume.
[25] "Communication Theory of Secrecy Systems," Bell System Technical
Journal, Vol. 28 (1949), pp. 656-715. "The material in this paper
appeared originally in a confidential report 'A Mathematical Theory of
Cryptography', dated Sept. 1, 1945, which has now been declassified."
Included in Part A.
-3-
[26] "Mixed Statistical Determinate Systems," Typescript, Sept. 19, 1945,
Bell Laboratories, 17 pp. Included in this volume.
[27] (With R. B. Blackman and H. W. Bode) "Data Smoothing and
Prediction in Fire-Control Systems," Summary Technical Report,
Div. 7, National Defense Research Committee, Vol. 1 , Gunfire Control,
Washington, DC, 1946, pp. 71-159 and 166-167. AD 200795. Also in
National Military Establishment Research and Development Board,
Report #13 MGC 12/1, August 15, 1948. Superseded by [51] and by R.
B. Blackman, Linear Data-Smoothing and Prediction in Theory and
Practice, Addison-Wesley, Reading, Mass., 1965. Included in this
volume.
[28] (With B. M. Oliver) "Communication System Employing Pulse Code
Modulation," Patent 2,801,281. Filed Feb. 21, 1946, granted July 30,
1957. Not included.
[29] (With B. D. Holbrook) "A Sender Circuit For Panel or Crossbar
Telephone Systems," Patent application circa 1946, application dropped
April 13, 1948. Not included.
[30] (With C. L. Dolph) "The Transient Behavior of a Large Number of
Four-Terminal Unilateral Linear Networks Connected in Tandem,"
Memorandum MM 46-110-49, April 10, 1946, Bell Laboratories, 34 pp.
+ 16 figs. Included in this volume.
[31] "Electronic Methods in Telephone Switching," Typescript, October 17,
1946, Bell Laboratories, 5 pp. + 1 fig. Included in this volume.
[32] "Some Generalizations of the Sampling Theorem," Typescript, March
4, 1948, 5 pp. + 1 fig. Included in this volume.
[33] (With J. R. Pierce and J. W. Tukey) "Cathode-Ray Device," Patent
2,576,040. Filed March 10, 1948, granted Nov. 20, 1951. Not included.
[34] "The Normal Ergodic Ensembles of Functions," Typescript, March 15,
1948, 5 pp. Included in this volume.
[35] "Systems Which Approach the Ideal as P/N -> oo," Typescript, March
15, 1948, 2 pp. Included in this volume.
[36] "Theorems on Statistical Sequences," Typescript, March 15, 1948, 8 pp.
Included in this volume.
[37] "A Mathematical Theory of Communication," Bell System Technical
Journal, Vol. 27 (July and October 1948), pp. 379-423 and 623-656.
Reprinted in D. Slepian, editor, Key Papers in the Development of
Information Theory, IEEE Press, NY, 1974. Included in Part A.
[38] (With Warren Weaver) The Mathematical Theory of Communication,
University of Illinois Press, Urbana, JL, 1949, vi + 1 17 pp. Reprinted
(and repaginated) 1963. The section by Shannon is essentially identical
to the previous item. Not included.
[39] (With Warren Weaver) Mathematische Grundlagen der
Informationstheorie, Scientia Nova, Oldenbourg Verlag, Munich, 1976,
pp. 143. German translation of the preceding book. Not included.
[40] (With B. M. Oliver and J. R. Pierce) "The Philosophy of PCM,"
Proceedings Institute of Radio Engineers, Vol. 36 (1948), pp. 1324-
1331. (Received May 24, 1948.) Included in Part A.
[41] "Samples of Statistical English," Typescript, June 11, 1948, Bell
Laboratories, 3 pp. Included in this volume.
[42] "Network Rings," Typescript, June 11, 1948, Bell Laboratories, 26 pp.
+ 4 figs. Included in Part B.
[43] "Communication in the Presence of Noise," Proceedings Institute of
Radio Engineers, Vol. 37 (1949), pp. 10-21. (Received July 23, 1940
[1948?].) Reprinted in D. Slepian, editor, Key Papers in the
Development of Information Theory, IEEE Press, NY, 1974. Reprinted
in Proceedings Institute of Electrical and Electronic Engineers, Vol. 72
(1984), pp. 1192-1201. Included in Part A.
[44] "A Theorem on Coloring the Lines of a Network," Journal of
Mathematics and Physics, Vol. 28 (1949), pp. 148-151. (Received Sept.
14, 1948.) Included in Part B.
[45] "Significance and Application [of Communication Research],"
Symposium on Communication Research, 11-13 October, 1948, Research
and Development Board, Department of Defense, Washington, DC, pp.
14-23, 1948. Included in this volume.
[46] "Note on Certain Transcendental Numbers," Typescript, October 27,
1948, Bell Laboratories, 1 p. Included in this volume.
[47] "A Case of Efficient Coding for a Very Noisy Channel," Typescript,
Nov. 18, 1948, Bell Laboratories, 2 pp. Included in this volume.
[48] "Note on Reversing a Discrete Markhoff Process," Typescript, Dec. 6
1948, Bell Laboratories, 2 pp. + 2 Figs. Included in this volume.
[49] "Information Theory," Typescript of abstract of talk for American
Statistical Society, 1949, 5 pp. Included in this volume.
[50] "The Synthesis of Two-Terminal Switching Circuits," Bell System
Technical Journal, Vol. 28 (Jan., 1949), pp. 59-98. Included in Part B.
[51] (With H. W. Bode) "A Simplified Derivation of Linear Least Squares
Smoothing and Prediction Theory," Proceedings Institute of Radio
Engineers, Vol. 38 (1950), pp. 417-425. (Received July 13, 1949.)
Included in Part B.
-5-
[52] "Review of Transformations on Lattices and Structures of Logic by
Stephen A. Kiss," Proceedings Institute of Radio Engineers, Vol. 37
(1949), p. 1 163. Included in Part B.
[53] "Review of Cybernetics, or Control and Communication in the Animal
and the Machine by Norbert Wiener," Proceedings Institute of Radio
Engineers, Vol. 37 (1949), p. 1305. Included in Part B.
[54] "Programming a Computer for Playing Chess," Philosophical
Magazine, Series 7, Vol. 41 (No. 314, March 1950), pp. 256-275.
(Received Nov. 8, 1949.) Reprinted in D. N. L. Levy, editor, Computer
Chess Compendium, Springer- Verlag, NY, 1988. Included in Part B.
[55] "A Chess-Playing Machine," Scientific American, Vol. 182 (No. 2,
February 1950), pp. 48-51. Reprinted in The World of Mathematics,
edited by James R. Newman, Simon and Schuster, NY, Vol. 4, 1956, pp.
2124-2133. Included in Part B.
[56] "Memory Requirements in a Telephone Exchange," Bell System
Technical Journal, Vol. 29 (1950), pp. 343-349. (Received Dec. 7,
1949. ) Included in Part B.
[57] "A Symmetrical Notation for Numbers," American Mathematical
Monthly, Vol. 57 (Feb., 1950), pp. 90-93. Included in Part B.
[58] "Proof of an Integration Formula," Typescript, circa 1950, Bell
Laboratories, 2 pp. Included in this volume.
[59] "A Digital Method of Transmitting Information," Typescript, no date,
circa 1950, Bell Laboratories, 3 pp. Included in this volume.
[60] "Communication Theory — Exposition of Fundamentals," in "Report
of Proceedings, Symposium on Information Theory, London, Sept.,
1950, " Institute of Radio Engineers, Transactions on Information
Theory, No. 1 (February, 1953), pp. 44-47. Included in Part A.
[61] "General Treatment of the Problem of Coding," in "Report of
Proceedings, Symposium on Information Theory, London, Sept., 1950,"
Institute of Radio Engineers, Transactions on Information Theory, No. 1
(February, 1953), pp. 102-104. Included in Part A.
[62] "The Lattice Theory of Information," in "Report of Proceedings,
Symposium on Information Theory, London, Sept., 1950," Institute of
Radio Engineers, Transactions on Information Theory, No. 1 (February,
1953), pp. 105-107. Included in Part A.
[63] (With E. C. Cherry, S. H. Moss, Dr. Uttley, I. J. Good, W. Lawrence and
W. P. Anderson) "Discussion of Preceding Three Papers," in "Report
of Proceedings, Symposium on Information Theory, London, Sept.,
1950," Institute of Radio Engineers, Transactions on Information
Theory, No. 1 (February, 1953), pp. 169-174. Included in Part A.
[64] "Review of Description of a Relay Computer, by the Staff of the
[Harvard] Computation Laboratory," Proceedings Institute of Radio
Engineers, Vol. 38 (1950), p. 449. Included in Part B.
[65] "Recent Developments in Communication Theory," Electronics, Vol.
23 (April, 1950), pp. 80-83. Included in Part A.
[66] German translation of [65], in Tech. Mitt. P.T.T., Bern, Vol. 28 (1950),
pp. 337-342. Not included.
[67] "A Method of Power or Signal Transmission To a Moving Vehicle,"
Memorandum for Record, July 19, 1950, Bell Laboratories, 2 pp. + 4
figs. Included in Part B.
[68] "Some Topics in Information Theory," in Proceedings International
Congress of Mathematicians (Cambridge, Mass., Aug. 30 - Sept. 6, 1950)
, American Mathematical Society, Vol. II (1952), pp. 262-263. Included
in Part A.
[69] "Prediction and Entropy of Printed English," Bell System Technical
Journal, Vol. 30 (1951), pp. 50-64. (Received Sept. 15, 1950.)
Reprinted in D. Slepian, editor, Key Papers in the Development of
Information Theory, IEEE Press, NY, 1974. Included in Part A.
[70] "Presentation of a Maze Solving Machine," in Cybernetics: Circular,
Causal and Feedback Mechanisms in Biological and Social Systems,
Transactions Eighth Conference, March 15-16, 1951, New York, N. K,
edited by H. von Foerster, M. Mead and H. L. Teuber, Josiah Macy Jr.
Foundation, New York, 1952, pp. 169-181. Included in Part B.
[71] "Control Apparatus," Patent application Aug. 1951, dropped Jan. 21,
1954. Not included.
pp. Included in this volume.
[73] "A Mind-Reading (?) Machine," Typescript, March 18, 1953, Bell
Laboratories, 4 pp. Included in Part B.
[74] (With E. F. Moore) "The Relay Circuit Analyzer," Memorandum MM
53-1400-9, March 31, 1953, Bell Laboratories, 14 pp. + 4 figs. Included
in this volume.
[75] "The Potentialities of Computers," Typescript, April 3, 1953, Bell
Laboratories. Included in Part B.
[76] "Throbac I," Typescript, April 9, 1953, Bell Laboratories, 5 pp.
Included in Part B.
[72] "Creative Thinking,"
20, 1952, Bell Laboratories, 10
[77] "Throbac - Circuit Operation," Typescript, April 9, 1953, Bell
Laboratories, 7 pp. Included in this volume.
-7-
[78] "Tower of Hanoi," Typescript, April 20, 1953, Bell Laboratories, 4 pp.
Included in this volume.
[79] (With E. F. Moore) "Electrical Circuit Analyzer," Patent 2,776,405.
Filed May 18, 1953, granted Jan. 1, 1957. Not included.
[80] (With E. F. Moore) "Machine Aid for Switching Circuit Design,"
Proceedings Institute of Radio Engineers, Vol. 41 (1953), pp. 1348-
1351. (Received May 28, 1953.) Included in Part B.
[81] "Mathmanship or How to Give an Explicit Solution Without Actually
Solving the Problem," Typescript, June 3, 1953, Bell Laboratories, 2 pp.
Included in this volume.
[82] "Computers and Automata," Proceedings Institute of Radio Engineers,
Vol.41 (1953), pp. 1234-1241. (Received July 17, 1953.) Reprinted in
Methodos, Vol. 6 (1954), pp. 1 15-130. Included in Part B.
[83] "Realization of All 16 Switching Functions of Two Variables Requires
18 Contacts," Memorandum MM 53-1400-40, November 17, 1953, Bell
Laboratories, 4 pp. + 2 figs. Included in Part B.
[84] (With E. F. Moore) "The Relay Circuit Synthesizer," Memorandum
MM 53-140-52, November 30, 1953, Bell Laboratories, 26 pp. + 5 figs.
Included in this volume.
[85] (With D. W. Hagelbarger) "A Relay Laboratory Outfit for Colleges,"
Memorandum MM 54-114-17, January 10, 1954, Bell Laboratories.
Included in Part B.
[86] "Efficient Coding of a Binary Source With One Very Infrequent
Symbol," Memorandum MM 54-114-7, January 29, 1954, Bell
Laboratories. Included in Part A.
[87] "Bounds on the Derivatives and Rise Time of a Band and Amplitude
Limited Signal," Typescript, April 8, 1954, Bell Laboratories, 6 pp. + 1
Fig. Included in this volume.
[88] (With Edward F. Moore) "Reliable Circuits Using Crummy Relays,"
Memorandum 54-114-42, Nov. 29, 1954, Bell Laboratories. Published
as the following two items.
[89] (With Edward F. Moore) "Reliable Circuits Using Less Reliable Relays
I," Journal Franklin Institute, Vol. 262 (Sept., 1956), pp. 191-208.
Included in Part B.
[90] (With Edward F. Moore) "Reliable Circuits Using Less Reliable Relays
n," Journal Franklin Institute, Vol. 262 (Oct., 1956), pp. 281-297.
Included in Part B.
[91] (Edited jointly with John McCarthy) Automata Studies, Annals of
Mathematics Studies Number 34, Princeton University Press, Princeton,
-8-
NJ, 1956, ix + 285 pp. The Preface, Table of Contents, and the two
papers by Shannon are included in Part B.
[92] (With John McCarthy), Studien zur Theorie der Automaten, Munich,
1974. (German translation of the preceding work.)
[93] ' 'A Universal Turing Machine With Two Internal States," Memorandum
54-114-38, May 15, 1954, Bell Laboratories. Published in Automata
Studies, pp. 157-165. Included in Part B.
[94] (With Karel de Leeuw, Edward F. Moore and N. Shapiro)
"Computability by Probabilistic Machines," Memorandum 54-114-37,
Oct. 21, 1954, Bell Laboratories. Published in [87], pp. 183-212.
Included in Part B.
[95] "Concavity of Transmission Rate as a Function of Input Probabilities,"
Memorandum MM 55-1 14-28, June 8, 1955, Bell Laboratories. Included
in this volume.
[96] "Some Results on Ideal Rectifier Circuits," Memorandum MM 55-1 14-
29, June 8, 1955, Bell Laboratories. Included in Part B.
[97] "The Simultaneous Synthesis of s Switching Functions of n Variables,"
Memorandum MM 55-1 14-30, June 8, 1955, Bell Laboratories. Included
in Part B.
[98] (With D. W. Hagelbarger) "Concavity of Resistance Functions,"
Journal Applied Physics, Vol. 27 (1956), pp. 42-43. (Received August 1,
1955.) Included in Part B.
[99] ' 'Game Playing Machines," Journal Franklin Institute, Vol. 260 ( 1 955),
pp. 447-453. (Delivered Oct. 19, 1955.) Included in Part B.
[100] "Information Theory," Encyclopedia Britannica, Chicago, IL, 14th
Edition, 1968 printing, Vol. 12, pp. 246B-249. (Written circa 1955.)
Included in Part A.
[101] "Cybernetics," Encyclopedia Britannica, Chicago, IL, 14th Edition,
1968 printing, Vol. 12. (Written circa 1955.) Not included.
[102] "The Rate of Approach to Ideal Coding (Abstract)," Proceedings
Institute of Radio Engineers, Vol. 43 (1955), p. 356. Included in Part A.
[103] "The Bandwagon (Editorial)," Institute of Radio Engineers,
Transactions on Information Theory, Vol. IT-2 (March, 1956), p. 3.
Included in Part A.
[104] "Information Theory," Seminar Notes, Massachusetts Institute of
Technology, 1956 and succeeding years. Included in this volume.
Contains the following sections:
"A skeleton key to the information theory notes," 3 pp. "Bounds on the
-9-
tails of martingales and related questions," 19 pp. "Some useful
inequalities for distribution functions," 3 pp. "A lower bound on the
tail of a distribution," 9 pp. "A combinatorial theorem," 1 p. "Some
results on determinants," 3 pp. "Upper and lower bounds for powers of
a matrix with non-negative elements," 3 pp. "The number of sequences
of a given length," 3 pp. "Characteristic for a language with
independent letters," 4 pp. "The probability of error in optimal codes,"
5 pp. "Zero error codes and the zero error capacity C0," 10 pp.
"Lower bound for Pef for a completely connected channel with
feedback," 1 p. "A lower bound for Pe when R > C," 2 pp. "A lower
bound for Pe," 2 pp. "Lower bound with one type of input and many
types of output," 3 pp. "Application of 'sphere-packing' bounds to
feedback case," 8 pp. "A result for the memoryless feedback channel,"
1 p. "Continuity of P e opt as a function of transition probabilities," 1 p.
"Codes of a fixed composition," 1 p. "Relation of Pe to p," 2 pp.
"Bound on Pe for random ode by simple threshold argument," 4 pp.
"A bound on Pe for a random code," 3 pp. "The Feinstein bound," 2
pp. "Relations between probability and minimum word separation," 4
pp. "Inequalities for decodable codes," 3 pp. "Convexity of channel
capacity as a function of transition probabilities," 1 pp. "A geometric
interpretation of channel capacity," 6 pp. "Log moment generating
function for the square of a Gaussian variate," 2 pp. "Upper bound on
Pe for Gaussian channel by expurgated random code," 2 pp. "Lower
bound on Pe in Gaussian channel by minimum distance argument," 2
pp. "The sphere packing bound for the Gaussian power limited
channel," 4 pp. "The ^-terminal channel," 7 pp. "Conditions for
constant mutual information," 2 pp. "The central limit theorem with
large deviations," 6 pp. "The Chernoff inequality," 2 pp. "Upper and
lower bounds on the tails of distributions," 4 pp. "Asymptotic behavior
of the distribution function," 5 pp. "Generalized Chebycheff and
Chernoff inequalities," 1 p. "Channels with side information at the
transmitter," 13 pp. "Some miscellaneous results in coding theory," 15
pp. "Error probability bounds for noisy channels," 20 pp.
[105] "Reliable Machines from Unreliable Components," notes of five
lectures, Massachusetts Institute of Technology, Spring 1956, 24 pp. Not
included.
[106] "The Portfolio Problem, and How to Pay the Forecaster," lecture notes
taken by W. W. Peterson, Massachusetts Institute of Technology, Spring,
1956, 8 pp. Included in this volume.
[107] "Notes on Relation of Error Probability to Delay in a Noisy Channel,"
notes of a lecture, Massachusetts Institute of Technology, Aug. 30, 1956,
3 pp. Included in this volume.
"Notes on the Kelly Betting Theory of Noisy Information," notes of a
lecture, Massachusetts Institute of Technology, Aug. 31, 1956, 2 pp.
- 10-
Included in this volume.
[109] "The Zero Error Capacity of a Noisy Channel," Institute of Radio
Engineers, Transactions on Information Theory, Vol. IT-2 (September,
1956), pp. S8-S19. Reprinted in D. Slepian, editor, Key Papers in the
Development of Information Theory, IEEE Press, NY, 1974. Included in
Part A.
[110] (With Peter Elias and Amiel Feinstein) "A Note on the Maximum Flow
Through a Network," Institute of Radio Engineers, Transactions on
Information Theory, Vol. IT-2 (December, 1956), pp. 117-119.
(Received July 11, 1956.) Included in Part B.
[Ill] "Certain Results in Coding Theory for Noisy Channels," Information
and Control, Vol. 1 (1957), pp. 6-25. (Received April 22, 1957.)
Reprinted in D. Slepian, editor, Key Papers in the Development of
Information Theory, IEEE Press, NY, 1974. Included in Part A.
[112] "Geometrische Deutung einiger Ergebnisse bei die Berechnung der
Kanal Capazitat" [Geometrical meaning of some results in the
calculation of channel capacity], Nachrichtentechnische Zeit. (N.T.Z.),
Vol. 10 (No. 1, January 1957), pp. 1-4. Not included, since the English
version is included.
[113] "Some Geometrical Results in Channel Capacity," Verband Deutsche
Elektrotechniker Fachber., Vol. 19 (II) (1956), pp. 13-15 =
Nachrichtentechnische Fachber. (N.T.F.), Vol. 6 (1957). English version
of the preceding work. Included in Part A.
[1 14] "Von Neumann's Contribution to Automata Theory," Bulletin American
Mathematical Society, Vol. 64 (No. 3, Part 2, 1958), pp. 123-129.
(Received Feb. 10, 1958.) Included in Part B.
[115] "A Note on a Partial Ordering for Communication Channels,"
Information and Control, Vol. 1 (1958), pp. 390-397. (Received March
24, 1958.) Reprinted in D. Slepian, editor, Key Papers in the
Development of Information Theory, IEEE Press, NY, 1974. Included in
Part A.
[116] "Channels With Side Information at the Transmitter," IBM Journal
Research and Development, Vol. 2 (1958), pp. 289-293. (Received Sept.
15, 1958.) Reprinted in D. Slepian, editor, Key Papers in the
Development of Information Theory, IEEE Press, NY, 1974. Included in
Part A.
[117] "Probability of Error for Optimal Codes in a Gaussian Channel," Bell
System Technical Journal, Vol. 38 (1959), pp. 611-656. (Received Oct.
17, 1958.) Included in Part A.
[118] "Coding Theorems for a Discrete Source With a Fidelity Criterion,"
Institute of Radio Engineers, International Convention Record, Vol. 7
-11 -
(Part 4, 1959), pp. 142-163. Reprinted with changes in Information and
Decision Processes, edited by R. E. Machol, McGraw-Hill, NY, 1960,
pp. 93-126. Reprinted in D. Slepian, editor, Key Papers in the
Development of Information Theory, IEEE Press, NY, 1974. Included in
Part A.
[119] "Two-Way Communication Channels," in Proceedings Fourth Berkeley
Symposium Probability and Statistics, June 20 - July 30, 1960 , edited by
J. Neyman, Univ. Calif. Press, Berkeley, CA, Vol. 1, 1961, pp. 611-644.
Reprinted in D. Slepian, editor, Key Papers in the Development of
Information Theory, IEEE Press, NY, 1974. Included in Part A.
[120] "Computers and Automation — Progress and Promise in the Twentieth
Century," Man, Science, Learning and Education. The Semicentennial
Lectures at Rice University , edited by S. W. Higginbotham, Supplement
2 to Vol. XLIX, Rice University Studies, Rice Univ., 1963, pp. 201-211.
Included in Part B.
[121] Papers in Information Theory and Cybernetics (in Russian), Izd. Inostr.
Lit., Moscow, 1963, 824 pp. Edited by R. L. Dobrushin and O. B.
Lupanova, preface by A. N. Kolmogorov. Contains Russian translations
of [1], [6], [14], [25], [37], [40], [43], [44], [50], [51], [54]-[56], [65],
[68]-[70], [80], [82], [89], [90], [93], [94], [99], [103], [109]-[111],
[113H119].
[122] (With R. G. Gallager and E. R. Berlekamp) "Lower Bounds to Error
Probability for Coding on Discrete Memoryless Channels I,"
Information and Control, Vol. 10 (1967), pp. 65-103. (Received Jan. 18,
1966.) Reprinted in D. Slepian, editor, Key Papers in the Development
of Information Theory, IEEE Press, NY, 1974. Included in Part A.
[123] (With R. G. Gallager and E. R. Berlekamp) "Lower Bounds to Error
Probability for Coding on Discrete Memoryless Channels U,"
Information and Control, Vol. 10 (1967), pp. 522-552. (Received Jan.
18, 1966.) Reprinted in D. Slepian, editor, Key Papers in the
Development of Information Theory, IEEE Press, NY, 1974. Included in
Part A.
[124] "The Fourth-Dimensional Twist, or a Modest Proposal in Aid of the
American Driver in England," typescript, All Souls College, Oxford,
Trinity term, 1978, 7 pp. + 8 figs. Included in this volume.
[125] "Claude Shannon's No-Drop Juggling Diorama," Juggler's World, Vol.
34 (March, 1982), pp. 20-22. Included in Part B.
[126] "Scientific Aspects of Juggling," Typescript, circa 1980. Included in
PartB.
[127] "A Rubric on Rubik Cubics," Typescript, circa 1982, 6 pp. Included in
this volume.
K-t7«IA (-*»*)
is J
Cover Sheet for Technical Memoranda
Research Department
subject: The Use of the Lakato s-Hi okman Relay in a
Subscriber Sender - Case 20878
ROUTING:
i - Patent .Deit. (letter 9/27/40)
/
1 — e— W.W.Ke^all, Case Pile
3 - T.C.Fry
4 - A* B. Clark
s - B.D.Holbrook
6 - G.R.Stibitz
7 - G.V.King
8 -Miss Hanle
mm- 40-130-179
date August 13, 1940
author c.E.Shannon
INDEX NO. S4.2
ABSTRACT
A study is made of the possibilities of using
the Lakato s- Hickman type relay for the counting, regis-
tering, steering, and pulse apportioning operations in
a subscriber sender. Cirouits are shown for the more
important parts of the circuit where it appears that the
new type relay would effeot an eoonomy.
a
Tilt Use of the Lakatos-Hiokman Relay in a Sub bo r iter Sander •
Cast E0878
/
August 15, 1940
MEMORANDUM FOR ITU
The Lakatos-Siokmen type relay1* using the relay springs
as part of the magnetic eiroult can he used as a very eeonomioal
type of pulse counter and registration device. In faot , one suoh
relay with twenty moving springs can count and register up to ten
pulses, while the same operation requires at least five ordinary
relays, and some standard oirouits use as many as twenty to re-
duce the spring loading on the relays and the contact loading in
the pulsing circuit. It has been suggested that this new type
of relay might he used for some or all of the many counting,
steering, and registration oirouits in a subscriber type sender*
The present memorandum gives some oirouits for accomplishing
this* The chief problem in the design of these oirouits Is
that of performing the various translating operations necessary
in converting the incoming pulses into group and brush selections,
or P.C.I, pulses as the oase may be, without using more oontaot
elements than are available on the counting relay. Two different
solutions are given here. The first was made as economical as
possible but at the oost of one disadvantage. Under certain
conditions of oontaet failure in the thousands or hundreds regis-
ter the sender will oonneot the subscriber to an incorrect number
rather than connect ing to a tell-tale and giving him a busy sig-
nal. The seoond oiroult, which we will call the positive aotion
oiroult^, is designed to overcome this difficulty but does so at
the expense of more contaots and wiring. Some compromise between
these circuits may be the most desirable. The oirouits by no
means represent a complete sender. It appears that the problems
connected with the offioe code (i.e. the first two or three
digits) can be handled without muoh difficulty. At any rate
these oirouits will depend on the type of decoder used, and
would represent a second stage in the design* We have therefore
designed what might be called a "four digit sender** considering
only the problems arising in the thousands, hundreds, tens and
units digits. We also have omitted consideration of the parts
of the oiroult used for control and supervisory purposes, since
these can be easily handled by existing oirouits, and do not
directly involve the new type relay. Our chief purpose is to
Isee "Oiroult Analysis for Laxatos-Eiokman Type Relay",
0. R. Stibits, MM40-150-1BO, Jan. 15, 1940, Oase £0878.
^This circuit was suggested by Hr. 0. T. King
■how that the new type counter oontalna sufficient contact
element! for aost of the steering and counting circuit* of the
subscriber sender. It is always possible to add more contacts
at an/ stage in the new type counter by the arrangement of
springs in Jig. 1, but this would be undesirable from the
standpoint of standardization* At any rate it was found that
even in the positive action circuit, only two stages in one
register needed more contacts than are already available, and
two additional ordinary relays were introduced here to carry the
contact load*
It should be pointed out that an extremely simple and
economical sender (i.e., much simpler than those given here)
could be designed using the new type counter were it not for
the peculiar translation codes involved. Thus if we could start
*Yrom scratch" and design translation codes particularly adapted
to the characteristics of the new relay, the circuits could be
made very simple indeed. Even using the existing oodes which
were constructed to simplify the present type olrouits, the use
of the new counter allows a remarkable simplicity and economy*
The circuits were designed by a combination of common
sense and Boolean algebra methods. We will omit the details
involved in their design. Although it is possible that a few
superfluous elements remain, it is doubtful if they can be
simplified very much*
Figure E is a block diagram of the proposed sender*
In the present panel and crossbar senders, pulse counting is
done in the same circuit for each digit and the numbers trans-
ferred from this counting circuit to a set of registering cir-
cuits, one for eaoh digit, through an incoming steering chain.
The registering circuits in the panel type sender consist of a
set of five ordinary relays per digit, while in the crossbar
system the A digit is registered on one or two verticals of a
crossbar switch* In Figure S, on the other hand, eaoh digit
has one of the new type counter relays which acts both as a
pulse counter and as a register. The incoming steering chain
steers the incoming pulses to the correct counter-register
rather than steering the number recorded by the input pulse
counter to a digit register* The input steering chain may or
may not be one of the new type counters* The steering opera-
tion can be done with the new type counter, but it appears to
require special devices, as for example polarised springs, in
order to energize both windings of the register relays after
receiving a digit* Even using the present type of steering
chain a great simplification is possible, for only one wire,
the pulsing lead, needs to be steered to the various digit
registers, rather than the five leads of the present type
sender* Another possibility is using a new type counter to
count the groups of pulses and operate a set of relays 8^, Sj,
Sq, Sthi Sst Sf » sU come 1a after the A, B, 0, IB, I, T,
and U digits are received end energize both eoile of the corre-
sponding registers*
After the digits are registered on the new type
counters, these numbers are translated bj means of the oontaet
interconnections into the code corresponding to the incoming
brush, incoming group, final brush, tens, and units selections,
which are represented by a ground on one of the leads in the
groups marked IB, 10, YB, T, and V, respectively. These groups
of leads are connected in sequence to the revertive pulse counter
by means of the revert ire group counter* The revertive pulse
counter will be one of the new type relays and is connected in
suoh a way as to open the fundamental circuit and thus stop the
revertive pulsing when it reaches the first ground. The revertive
group counter or revertive steering chain, of course, steps ahead
after each group of revertive pulses through the action of a slow
release relay. This last steering operation cannot be done solely
with one of the new type relays for it is necessary to steer ten
leads in the tens and units digits. It could be done, however,
with a new type counter in conjunction with four ordinary relays.
In the case of a call to a manual office the outputs
of the digit registers are translated by a P.O.I, circuit into
the correct P.O.I, codes. This circuit, too, can make use of the
new type counter in the quadrant ing operation, i.e. in apportion-
ing four quadrants to each of the four digits to be transmitted.
This would be done with a sixteen stage counter (or if it is de-
sirable to have all oounters with ten stages, two of these could
be connected "in series") replacing the present sequence switch*
Of course there must be an interlock between the incom-
ing and revertive steering chains to prevent any selection being
made before sufficient information has been received. This can
be done by fairly standard methods*
A rough comparison can be made between the relay re-
quirements of the present panel type sender end the design pro*
posed here. Omitting parts of the circuit which would be sub-
stantially the same the requirements are listed below:
Present
Panel Sender Proposed Sender
Ordinary Hew Type Ordinary
Operation Relays Counters Belays
Input Counting 1* -
Input Steering It i •
Registration »• f
Revertive Counting . *Q t «
Revertive Steering 10 L- JL
Total U T
In addition, a eequenoe ewitoh la replaoed by a new type counter.
Tliasa figures are based on the positive action oirouit. Jhe
other oirouit uses 6 ordinary relays. This eoaparison of the
numbers of relays involved shows only a small part of the saving,
however. The wiring and fundamental method of operation of the
new oirouit is muoh simpler which tends both toward eoonomy and,
providing the new relay ©an be made suffielently reliable, elim-
ination of faults and errors*
It is a little more difficult to give a quantitative
comparison of tha proposed sender with the present crossbar type
sender due to the differences in the types of oirouit elements In-
volved, but it appears that the saving would be of the same order
of magnitude*
The new type counter with ten stages aota like a series
of twenty relays which come in sequentially as the two coils of
the relay are alternately energized. Thus after n pulses the
first Sn relays are operated. If, after a series of pulses only
one of the two coils on a counter remains energized we can only
be sure of the oontacts on that side. It was found that under
these conditions the number of eontaots available was far too
small in all of the four registers for the various translating
operations neoessary. We have therefore assumed the steering
circuit should be designed in such a way as to energize both
coils of a counter after it has received its series of pulses**
This insures the oontacts on both sides and each stage then has
the equivalent of two transfer eontaots and two additional eon-
taots somewhat similar to a switohhook connection. Thus eaoh
stage may be considered as a relay with the eontaots available
indicated In figure 5. Our circuit diagrams are drawn from
this point of view*
Tor the convenience of the reader we will list the
various translation oodes used in the sender* The incoming
brush seleotlon depends only on the thousands digit and Is
given by the following tablet
Incoming Brush
Selection
0
1
t
8
4
Thousands
Digit
0, 1
*, *
4. 5
•See the memorandum "Oirouit Arrangement for Counting Relay with
Mechanically Independent Contact Springs", by B* D. Bolbrook,
HM-40-130-149, July 5, 1940, Oase ££108-1.
The incoming group ssleotion depends on both the
hundreds and thousands digits and is given bj tha following;
Thousands
Digit
Hundred!
Digit
odd
odd
< 6
< 5
Inooeiing Group
Salaotion
0
1
t
9
digit,
Tha final brush salaotion dapands only on tha hundreds
We hare tha following oodat
Hundred!
Digit
0, 6
1. •
*, 1
3, 8
4, •
Final Brush
Salaotion
s
3
4
P.O.I. Oode for Thousands Digit
It should be remembered that an inooming brush, incom-
ing group, or final brush saleotion of & corresponds to n ♦ 1
rerertire pulses. Tha same remark: applies to tha tans and hun-
dreds selection.
Digits are sent to a call indicator bjr series of posi-
tive and negative pulses, four for aaoh digit* Two different
codes are used for this, one for the thousands digit and tha
other for thehuadreda, tans, and units. The thousands oode is
an additive one baaed on the numbers 1, 2, 4, and 8 as follows:
IT
0
0
m
0
m
0
0
1
Thousands
Digit
1
8
5
4
5
*
8
9
0
Corresponding Additive
Fumbers
I
0
0
0
0
0
0
II
0
0
Quadrant
0
0
0
III
0
0
0
0
0
0
0
0
8
- 6 •
The sum of the numbers ocr responding to tht columns in whioh a
digit has tha symbol - gives that digit, henot tha additive
property of tha code. In this tabla I, II. IH, and IT refer
to tha four pulses or quadrants. In the first and third quadrants
0 represents a ground and a - represents a posit ire pulse. In the
even quadrants 0 means a light negative pulse and the -, a hear?
negative pulse. We have chosen this representation of the oode
for comparison with the P.O.I, circuit in which four leads are
grounded or not in aooordanoe with the above table* Thus if the
digit 8 is registered in the thousands place, lends II and HI in
a group I, II, III, IT are grounded. The presence or absence of
these grounds are translated into positive or negative pulses by
two relays TS and RS.
The hundreds, tens, and units P.O.I, code is also addi-
tive based on the numbers 1, S, 4, 6. Using the same conventions
it is represented by the following table:
P.O.I. Oode for Hundreds, Tens, and Units Digits
H, T, or Quadrant
u Digit i n in it
i .000
t o-oo
8 ..00
4 0 0 - 0
5 0 0 0 -
6 -00
T 0 — 0 —
8 - - 0
9 0 0-
0 0 0 0 0
Corresponding
Numbers (1) (8) (4) (5)
The circuit for the tens or units register is shown In Figure 4.
The operation is quite obvious. In the ease of a full mechanical
call, if 6 for example were dialed in the tans plaee, the first
six relays are looked in, which places a ground on the lead marked
6. These are connected through the revert ive steering chain to
the revertive counter which reaches this ground after the seventh
revert ive pulse. The presence of this ground operates a relay
whioh opens the fundamental circuit and stops the pulsing.
A ground is also put on leads II and HI for a P.O.I, call.
The operation of the P.O.I, circuit will be described later.
The thousands and hundreds register is shown in figure 5 for the
positive action circuit and in Figure 6 for the more economical
circuit. In Figure 8, many of the contaots do double duty,
translating both for P.O.I, and full mechanical calls. This is
done through a relay P which is operated for a manual call and
not for amechanical call. In the hundreds register there were
not enough contacts available in the fifth and tenth stages.
The relays R and 8 ere used to •arrjr part of the eontaot load*
This oireuit la designed ae that ohe and only one of the IB, 10,
and TB laada la grounded for a given number. In ease of a oon-
taot failure none would he grounded and the corresponding commu-
tator would supposedly go to a telltale. In the oirouit of figure
6, on the. other hand, more than one of the IB, 10, or TB leads may
he grounded at the same time. Thus if the thousands digit is 8,
both 8 and 4 in the IB group are grounded. If the back eontaet
on 8 failed the rerertive pulse counter would not stop the pulsing
aotion at brush 8 as it should but would go on to the fourth brush.
Howersr, this olreuit is considerably simpler than Figure 8, and
does not appear worse from the standpoint of possible wrong num-
bers than the present type of sender*
The P.C.I, eirouit is shown in Figure 7. I is a relay
whioh is operated in the odd quadrants and not in the even quad-
rants. TS and RS are relays whose windings are oonneoted sequen-
tially through the P.O.I, impulse ehain to first the thousands
P.O.I, leads I, II, IH, and IT, then the hundreds, etc. aoeord-
ing to the following tablet
Th
Digit
H
Digit
T
Digit
Digit
Pulsing
TS
RS
Stage
1
Z
Th I
Th II
8
Th III
Th II
8
z
Th III
Th IT
4
E I
Th IT
8
z
E I
E II
8
a*
E III
e n
; i
z
E III
E IT
i 8
m
T I
E IT
; •
z
T I
t n
10
m
T in
t n
11
z
T HI
T IT
;i»
U I
T IT
[18
z
V I
u n
u in
u n
18
z
v m
U IT
18
U IT
In the odd quadrants Z is operated, placing a ground on the
fundamental ring (»)• The fundamental tip (FT) ia connected
through Z to either ground or positive battery according as
TS is operated or not. This depends of course on the condl-
- 8 -
t ion of the P.C.I, lead to whioh TS is connected at the time*
Similarly in the eran quadrants light or beary roltage is
applied to FR according to the eondition of RS while FT is
grounded*
Figure 8 shows the rerertire steering chain and re-
rertire pulse counter.
0. S* SHANNON
FIG. 3
— I
— u
V~ m
>
I 7
a
L 9 J
FIG. 4-
TS/VS OR UMTS #£6/ST£K
X
u
■ Vj
TITLE
Vi
Vi
■
SCALE
Mil TtimM! UMIITMIIS. IK.. Ill
f
T1TLE
1
X
u
<\J
<\J
t
■i
SCALE
KU TELEPIHt UMMTMItt. IK.. »
M
■
■
J
E
E
ES
<
PRINTED INU S •
M M S H 0-C\J<T>«-
rr~i
3=)
n~i
Ah*, ^^h.
D
rrn
r~r~i
3 C"
<Hi- *<Hl<
k
^3
o <\j «i
■5
O - WO 1
I 1 ■
6
CM
9
5^
SCALE
IELI TEIEMW1E UMIUTOIIES, l*C. IE!
ES
PHI IN U.t.A.
l ill-A l«-3»)
F/0. 7
P.C.I. C/RCU/T
TITLE
\*
X
u
■ Vj
V)
V)
pi
►-
SCALE
lilt TELIPMIE liMIITMIH. IK.. lit
J
I
E
ES
<
IB <
16
0-
I -
2-
3-
4-
'o-
Z-
3-
T I
5-
6-
7 •
8-
0 <
-o o-
-o t>-
3 3
o o
9*
-o o-
o o-
■o o-
c o
o o
o o-
s's
o o
o o
K3 o
o O
S5
-O O-
-o o-
6
■ 7
9
W
x<~
I
2
j
4
5
6
7
?
8
9
10
hT
X
u
w
■ Vj
TITLE
vi
Vj"
8*
i
SCALE
Kit TEUF.ni UMUTHICI. IK., It* Tti
f
<
ES
A STUDY CF THE DEFLECTION MECHANISM
AND SOME RESULTS ON RATE FINDERS
by TKfS is a Final
UNDER OmU .T
Claude E. Shannon ^.w/L-lL - if) 4
SUMMARY OF THE MOST IMPORTANT RESULTS
1. The deflection mechanism may be divided into three partB.
The first is driven by two shafts and has one shaft as out-
put, which feeds the second part. This unit has a single
shaft output which serves as input to the third part, whose
output is also a single shaft, used as the desired azimuth cor-
rection.
2. The first unit is a simple integrator. It*, output rate is
3. The second part is the same circuit as previous rate finders.
Its presence appears to be detrimental to the operation of
the system from several standpoints. The output e of this part
satisfies i
• ■ x-f- y
Ll
4. The third and most important part of the macnine satisfies
q + R 4 + L q - •
in whicht
• ■ an input forcing function which except for transients in
the seoond part and other small effeots ia the function
whose rate is to bo found.
q ■ the rate of e as found by the device. The output of the
mechanism is sin"^" Q.
R, L, S are. positive constants depending on the gear ratios,
etc. in the machine.
The mechanism therefore acts like an R, L, C circuit in which
the differential inductance is a function of the current,
v 1 - q2
The system can be critically damped for differential displace-
ments near at most two values of the current.
Omitting the effect of backlash, the system is stable for any
initial conditions whatever, with a linear forcing function,
e s At + fl. It will approach asymptotically and possibly with
osoillation a position where q is proportional to e. An error
function can be found which decreases at a rate -R (q - qQ)2
4o being the asymptotic value of q.
If the system is less than critically damped ordinary gear
play type of backlash can and will cause oscillation. This
includes play in gears, aaaers, lead screws, rack and pinions
and looseness of balls in the integrator carriages. The oscilla-
tion is not unstable in the sense of being erratic, or growing
- 3 -
without limit, but is of a perfectly definite frequency and
amplitude. This type of backlash acts exactly like a peculiar
shaped periodic forcing function. Approximate formulas for
the frequenoy and amplitude of the oscillation are
r
2
and
/s2 I UoLd -A)2
<*0c
^ and B2 being the amounts of backlash in the two driven shafts
as measured in a certain manner.
8. elastic deformations of shafts and plates can be divided into
two parts. .One is exactly equivalent to the gear type of
backlash and may be grouped with B]_ and B2 above. The other
has the effect of altering the parameters R, L, S of the cir-
cuit and also adding higher order derivatives with small co-
efficients. This will slightly alter the time constant and
the natural frequency of the system.
9. The manner in which the arcsin function is obtained seems to
me distinctly disadvantageous to the operation of the system
for a nnmber of reasons, chiufly since to eliminate backlash
oscillation it requires high overdamping near q ■ 0 and this
slows down the response for low target speeds.
10. The general problem of rate finding and snoo-hing is con-
sidered briefly from two angles - as a problem in approxi-
mating a certain given transfer admittance ana as a problem
in finding the form of a differential equation. The first
method based on a linear differential equation leads to ten-
tative designs whicn I think would be an improvement over the
present one. The second method indicates the -ossibility of
still more improvement if non-linear equations can be satis-
factorily analyzed.
ANALYSIS OF THE DEFLECTION MECHANISM
general Considerations. The deflection mechanism is a aevice de-
signed to find 5i mechanically from the formula
• in*! = Sa^ tp
having cne shaft whose rate of turning is£a and another whose
angular position is Jj> t?f giving c-t as the position of a shaft.
The system is also supposed to smooth out small errors in^a*
The mechanism, as actually constructed, is shown in
Figure 1. By a rearrangement of adders, it may be drawn as shown
in Figure 2. incidently, the device of rearranging and combining
adder units is frequently useful in studying these systens. In
this case it both clarifies the physical operation and simplifies
the mathematical analysis. The box IV on the right of Fig. 1
represents two adders wigh, essentially, a common shaf t. The
output is equal to the sum of the inputs with the indicated signs
prefixed. A variable associated with a shaft represents the angu-
lar position of that shaft unless specifically stated otherwise.
Gears art omitted f rom t he diagram but included as coefficients
in the equations. It may also be worthwhile to point out that the
best method of setting down the equation of such a system is
usually the following:
1. Considering oniy the integrators and function Lie-vices,
label the various snafts UBing the minimum number of variaoles,
Yiorkin^ backward from driver to driving snafts. Thus if the out-
put of an integrator is labeled z, its displacement is i (assuming
constant disk rate). If the output of an x to In x gear is sin u,
its input is esin u . Marking backwards rives the differential
instead of the integral form of the equation.
2. Hew concentrate on the adders, grouping together cs
many as possible, and write the equations of constrain*. These
will be the equations of the system.
I find the use of electrical analogues very useful in
under standing tnese devices and have sed throughout a notation
which emchasizes this idea.
As the maohine is drawn in Fig. 2, it consists of threa
independently operating units. The output of the first i3 a
single shaft serving as input to the second, the output of the
second a single shaft feeding the third, and the output of this
being a shaft used as S 3,
The operation is ruughly as follows: Integrator I
multiplies its disk rate oy its displacement, so that the rate
of turning of its output is y = ^0 tp£a» The actual position of
this y shaft can carry no significance. It is
y ■
p. tp2a dt +• y0
a variable which cepencs on the entire previous history of tne
sighting telescopes to say nothing of possiole integrator slippage.
At two different tisas, vrith a target at the same position and
speed, this shaft would have entirely different angular nositions
but the same rate of turning.
The output of integrator I feeds into the middle uart
cf the system which is exactly the rate finder, of saost older
directors. This part of the divice seems to me net only super-
fluous but actually detrimental to the operation. It is equiva-
lent to an R, L, circuit (Fig. 3) with impressed voltage y and
cutout x, che voltage across the inductance
3. A small response h(t) for the function g(t).
High frequencies in g(t) appear practically un-
diminished and in the same pnase in h(t) since the
impedance is high compared to R.
Thus
- % t
In ^
1a t £e + h(t)
In adder III, x is added to y in equal proportions to give e.
e _ y + ±1 A +• K e Ll + h(t)
Rl
As vre pointed out above, y already contains an irrelevant additive
constant, so the addition of another, gj" A which happens to be pro-
portional to the target rate is of no possible significance. The
term K e ' certainly is only detrimental being an unwanted
transient. For a time I thought that the reason for the middle
part of the machine was the final term h(t). For hi^h frequen-
cies this is approximately g(t), and might be used to buck out
these high frequency following errors, much as was done in some
early radio circuits to recuce a-c hum. However, a study of the
design diagrams shows that the two error functions are actually
in phase as I have indicated in the equation, so that these high
frequency errors are added, making the situation worse. £ven if
the phase of x were reversed on entering adder III, I think it
doubtful whether the presence of this part of the system -would be
justifiable. It would be necessary to show that tne frequencies •
were high eno.gh so that the two actually did cancel, and also
that the disadvantages of the transient term did not overcome the
advantages obtained. Note that the middle part can function in
no way as a rate finder. The ri^ht hand part of the machine does
its own rate finding as we will see, and the rate found by the
middle part could not possibly be used because of the undetermined
constant in y.
•e prooeed now to the third part of the machine which
is the major concern of the study. Concentrating on the adder IV,
the equation of the system is obviously
L -| sin"1 q=e-3q-Rq
or
5 qt iiL L q = e
This is the equation of a series R, L, C, circuit with the in-
ductance a function of the current passing through it. Induc-
tance may be defined by the Lagrangian equations or by
- 10 -
and it is clear from the above equation that
A i ■ l sin"1 i
-1
or A . L Bia 1
This function varies as shoim in figure 4. For our work, however
a more useful parameter is what is sometimes called the differential
inductanoe which nay be defined by
so that in our case
This inductance is useful when we have an equilibrium current qg
and are considering the effect of small variations about this equi-
librium. Omitting second order terms the system will be equivalent
to one with constant R, L, G parameters, the inductance being
taken as L^. The variation of L-q with current is snown in figure 5.
The action is the opposite of that of a "swinging" choke where, be-
cause of saturation, the differential inductance decreases with
large currents.
The mechanical idea behind the operation of this system
is quite simple. Suppose shaft e to be turning at a constant rate.
The system will be in equilibrium if the displacement of integrator V
is such as to make its output feeding into the adder equal and op-
posite to e, and the displacement of integrator VI at zero. Under
these conditions, shaft q measures the rate of e and shaft V, the
output of the device, the arcsin of this rate, if the rates are
not correct, the adder changes the second derivative shaft in
such a direction as to equalize the rates. The q shaft serves as
a danper to prevent continual oscillation aoout the equilibrium
position.
- 12 -
MATHEMATICAL THEORY (Backlash not Present)
Differential Operation
If e is turning at a constant rate and the system is at
equilibrium, and then a small differential disturbance is applied
to the system, it will clearly respond very nearly like an R, L,
C, circuit with constant parameters, the inductance used being the
differential inductance for the equilibrium current
L
y'i - 41
Such a system has a tine constant of
2 Leff
2L
T x
a
tyl - q|
It is critically damped if
H2 - 4 Leff S ■
4L S
which, of course, only occurs at
16 i/
For values of q greater in absolute value than this, the system is
oscillatory, for values less, over damped.
- 13 -
Proof of General Stability -with Linear e
In proving the stability of this system, I have used a
method -which may be new in some respects. It was suggested by the
fact that in a non-dissipative mecnanioal system, the potential
energy U is a minimum at a point where the system is differentially
stable, and the method is, in a sense, a generalization of that
criterion. It is not, however, limited to differential stability,
or to non-dissipacive systems. Since the method may be of use in
other investigations of this type, I will first describe it in
general terms.
Suppose we have a differential equation system in which
n variables and derivatives may be specified independently in the
initial conditions. 7<e will say that the system is stable for all
initial conditions and all driving functions if any two solutions
of the system with the same driving funoiions approach each other
in the sense that
Lim 2 \x± - y±\ - o
t ->co i - r
where xj^t), x2( t) . . .x^t) is one solution and yx(t) ...yn(t) the
other. If this limit is zero for certain types of driving functions,
we will say the system is stable for these functions.
Thereomi If a continuous function Q(x1...zn, y1...yn,t) can be
found having the following properties '
X. Q>0 for all x±, yt, t, the equality holding if and
only if x± a y±.
- 14 -
2, dQ at all times, when the x^ and y^ are solutions
of the system, with the same driving function.
3. It is impossible for Q to remain indef initelj>A ^ 0.
Then the system is completely stable.
For the function Q is non- increasing but always^ 0 and
must therefore approach a limit A>0 as t ~>oo , but by 5. A^O
is impossible, hence A = 0, and each Ix^-y^/ — 5>0.
Conversely, it oan be shown that if only a single forc-
ing function is involved, and the system is stable for this funo-
tion, a Q exists of the type described.
Roughly, the method is to find a "distance" or "error"
function Q between two solutions which is zero only when the so-
lutions are identical and which always decreases.
As an example of this method it is easy to prove the
complete stability of the ordinary R, L, C, circuit with constant .
parameters without solving the equation. The differential equation
is
" Sq + R$ + L q = e
and we choose q and \ as coordinates. Let two solutions be q1#
q^and q2, q2«nd consider the funoticn Q = y (qi-q2)2+ £ (qx-qg) .
Condition 1 is obviously satisfied. How
||- SCqi-qgXqi-qg) + L(q^-q'2) (aj-qg)
- -r (ii-42)2£o
- 16 -
. S (n - At - 3 . EA)2
S
obviously the minimum of Q with respect to q occurs at
At B - SA
q - s + s
Also • a
q - s
ciQ = L
y 1 - q
which vanishes only for q'f It is readily verified that this
is a minimum, and that (J is zero at this point for any t. Now
dt oq »
i - s
5S(q-4-| + §)0..4)>L
S S 3- ~
1-q
and
Vl-q8
q s ^
- (At t- 3 - 3 q - R q)
if q rjid q satisfy
Sq f Bq + L > At +- B.
V 1 - q2
- 17 -
Hence
d| « (Sq - At - B f J£) (q - ±)
~ (4 " -f)Ut + 3 - Sq - Rq)
■ -E (q - |)2 * 0
Note that this rate is identical with that found in the linear case.
Incidentally, it was by working baokward from this rate that a
suitable function Q was first found.
For Q to approaoh a limit K>0, it is necessary for q
to approach zero, and q therefore, to approaoh a linear function
of t differing by a constant from its equilibrium value. But from
the original differential equation q must approach a oonstant different
from zero, which contradicts 4^0. This does not however, quite com-
plete the stability proof due to a certain meohanical peculiarity of the
system. Let us plot the equilevel lines of Q against axes X * (q - At
- | and Y « q. (Figure 6).
The x io sin x gear in tne ac-cuai mecnanisn has a limited
movement, and is prevented f rem going too far by e slip clutch and
stop. If ' q Z 1, the stop prevents ;qj from increasing anymore.
The original equation is replaced by
•
until the pressure on the stop reverses, oo far we have snowi that
under the original equation Q always aecreases. In terms of our
plot this means that if we start a solution inside the curve marked C,
the solution will certainly converge to the equilibrium position, for
the solution can never "escape" from C and hit one of the two lines
1 = r K, where the differential equation changes. ^7hen we are not on
- 19
one of these lines a solution will, in fact, spiral inward in the
clockwise sense, as maybe seen by writing the differential equation
in the form
(n - i* B 3A, R As _ L a
Consider the s igns of 5 and (q-A/s) in the four quadrants about the
equilibrium position. In I for example (q-A/S) > 0 and the X coordl-
nate of a solution must increase with tj q < 0 so q must decrease,
giving a clockwise sense to the notion. Similarly the other quadrants
may be verified. Some of the solutions starting out3ide of C will hit one of
the lines, but the solution will still be stable. It is easy to show,
by a study of the signs of the variables and their rates that a solu-
tion can only hit the upper line to the left of the point with
-
coordinates I = 1 (| - £) and Y . K, and that if one does, it will
nove along the lins to the right until it reaches P-^ and then return
to the original equation. similar situation holds for the lower
line. If we should start a solution on the upper line to the right
of Pj it would leave the line immediately. The solution is always
horizontal (i.e. q ■ <)) on tne line through P^, the equilibrium
point and Pg.
If R ■ 0 the function Q is constant since £S ■ o &nd
dt
therefore the solutions of the equation
Sq L q ■ At + B
- 20 -
are" the equilevel curves in Figure 6.
I have attempted in several different -ways to generalize
this proof for arbitrary input functions e(t), but so far have
no completely rigorous proof, dowever, some of the arguments
come so near as to make me almost certain of oomplete stability.
It can be shown, for example, that two different solutions with
the same e(t> cannot definitely divergei i.e. |qj>-q2| f | |i-4g \
cannot become and remain greater than some positive constant
(assuming e and e' bounded). Also if two solutions get close
together (with respect to both q and q), they will certainly con-
verge.
The Effect of Backlash
— — — — _____
In order to understand how backlash can cause oscillation,
let us first consider a much simplified case. Suppose we have a
second order linear system which is less than critically danmed with
no backlash (Figure 7).
Sq -f- R 4 + Lq-e
If, at t " 0 we suddenly impress e - E (constant) on the system
(q - \ = 0), the response is a damped oscillation (Figure 8).
- 21 -
Now in the mechanical system there are only two rf i
oniy two driven shales
811(1 B» and backlash only affB(.+. C •
or thes p dirCCtly) thS °Pe^ion
of these. ,robably tne gr
^ 18 W the adder av«+o„
driving shaft A. Let us assume for
assume for a moment that this is the
only backlash present and that its act.
shaft. 18 " f°ll0W8< ™*»
shaft a reverses airection ■ ( i.a whfln . n/
U.e. when q - 0) there i8 a Bhor±
— - * ^s w h01d„ ~ ~"
shaft ■ ^ &S MUUrfld from the ,
^ Xt 18 that the response of the
lash i. *h SyStem ^ bac^-
lash is the same as the response would be if the
lash and at the ti - "° ^
^ ^ ^ '™ <™sly Creasing -
aoout to increase) we turn the e shaft B
. w f 8haft "Bl «ni in such a way
8 ^ * — ^ing this turning.
snarly at the nest reversal we L±ve . .
mcre,ent Bj keeping J constant through th-
in n.v, 6 8 Peri°d 0f °acklash.
In other words, the res onse i8 that ^
that 01 a V-tea, without back-
lash on which we impress as f
& uxi0T;ion a wave wnich is
aoout as shown in Figure 9.
- 22 -
If the periods of backlash are comparatively short, the small
connecting portions (actually quadratic polynomials in time)
will have little effect on the response. That is, we can assume
a square topped wave with little error in $ or q especially, due
to the smoothing operation of the integrators (or, said another
way, cue to the high impedance of the circuit to ;a.gh frequencies).
How suppose that there is a certain amount of backlash
in shaft B. The action of this is to cause the carriage of the
upper integrator to remain stationary for a small period when
n
q I 0. The same effect would be achieved if, at tnis time, we
suddenly impressed on e a pulse wnich held the lower integrator
at fero and kept changing e at sucn a rate as to keep the lower
integrator there. lie keep the integrator at zero long enough so
that its output \70uld have turned an amount equal to the backlash
in B and then suddenly return it to its proper value, -his means
that the area of the pulse must equal the backlash. The shape of
this pulse would be a linear function of tine, but here again it
is not highly significant.
The entire system may thus be. replaced by one which is
free of backlash and subject to a- driving function of the type
shown in Figure 10, wnere B± is the backlash in A as measured
23 -
from e and Bg is the amount in B as measured from e (in the sense
that if e covers an area B2, shaft B moves an amount equal to itB
backlash) .
It is easy to see from our diagram that this forcing
function is in the correct phase to sustain the oscillation
of decay.
Tne fundamental component of this forcing function is
easily lound. .Ye have
T
Aj_ = y 6 sin — t^. dt
1
o
e may be split into a sum - one term for the square wave and
oae for the pulse-like 32 part. The i^2 pulse is all concentrated
near the center of the sine wave where it is nearly unity. Jfenoe
approximately
T
AX - | 2 h. sin 2*t dt 4B2
2 X r|»
^ o
= f-l 4 f o B2
it
The period T of this oscillation is the natural damped period
of the system, to within a small error of size comparable to the
length of tire during which backlash is effective. Hence itw
- 24
frequency is approximately
t - i fi T2
and the magnitude of the fundamental component of the response q
is
2£i 4 f 0 B2
I .
i R2 (coqLd- i \Z
"oc
Providing the quantity f!l 4 foB2 is 8111611 » the d*'
flection mechanism will behave linearly about its equilibrium
position and the above formulae would approximately hold. If
|qj / 0 the equilibrium value of inductance L would
/l^4q~
probably be as good as any to use since the differential inductance
is greater on one side and less on the other. At 4 - 0 the inductance
is greater on each side and a somewhat higher value should be used,
depending on 2B1 4f0B2» If tne 8ystem is more tnan critically
if
damped, q may or may not have an inflection point depending on the
initial conditions. If they are such that the driven shafts do
not reverse backlash cannot take effect and there should be no
oscillation. However, if they do reverse once, the system may
receive the equivalent of a "kick" in such a direction as to
cause another reversal and so on, so that oscillation is set up.
ihis problem has not been very well decided but if this happens,
the amplitude formula above should still hold, while the frequency
formula will not.
- 25 -
The question of "spring backlash" i.e. undesired effects
due to elastic deformations of shafts and mounting plates has been
raised. Acoording to Hooke's Law the angular strain in a shaft
is proportional to the applied torque. This torque in a shaft
the first term wnose si^n is that of -x1, being due to a coulomb
friction load, the second to a viscous friction load and the third
an accelerating torque.
It is clear that the coulomo friction term I, can be
combined with tie ordinary gear type backlasn treated above, and
acts, therefor s, like a periodic forcing function. The effect of
the other terms is ^uit.; different, their presence causes small
changes in the parameters and 6 of the circuit and also
adds higher derivatives to the equation. Let us consider only the
spring in the shafts feeding L q (i.e. assume q driven
whose position is x(t) can probably be very well approximated by
an equation of the form
I = ±\ +■ 2g ac« t K3 x"
(Sq - P1 q - Pz q)
(R 4 - fx q - ig «')
or
- 26 -
Sq + (R-Pi) q
'F2 - *1. 1
- r2 V = (e- «x i - a2e) - eX(t)
Spring in the drive to q a similar effeot although
complicated by the non-circular sine gears.
If e is a linear function of t, so is e^ and the forcing
function thus contains nothing to create a sustained oscillation.
The left-hand side differs only by small quantities from the ideal
equation
Sq - Sq - _Ji__ q = ex
, l-q>
and will therefore surely approach the solution
Thus we see that the "spring type" of backlash cannot cause sus-
tained oscillation as the ;,gear" type of backlash can. However,
if the gear type is present, the spring type can aid oscillation
by reducing the damping, it may be necessary to overdamp in some
cases in order to get an effective critical damping.
It should be pointed out that the gear type of backlash
may not be quite as simple as we have assumed, particularly in the
L a
shafts driving q 9 If the integrator carriage load is large
aanpared to the friction loads in the adders and gears, then we
are probably justified in assuming that gear pressures in the
drive only reverse when the driven shaft reverses, however, if
this is not the case, a backlash effect can easily take place at
other times, for example -when one of the shafts feeding the adder
reverses, without necessarily reversing the driven shaft \
The situation could become quite complicated, the equivalent input
function containing several different sized steps occurring at
different times, however, the fundamental frequency should Btill
be approximately the natural damped frequency of the system, pro-
viding the backlash effects are small and occur only during a small
fraction of the time.
The fact that backlash can cause a sustained oscillation
leads to a cfitioism of the design of the mechanism, in particular
to the metnod whereby the ercsin function is obtained. Note that
reducing the amount of gear backlash 4f 0B2 will reduce the
amplitude of oscillation proportionately, but apparently the only
way to eliminate it completely is to at least critically damp
the system for all equilibrium points, so that the shafts do not,
in general, reverse direction. In the deflection mechanism as
it stands, this would be distinctly disadvantageous, for if we
critically damp at the maximum values of jijj, (the governing
points) the system will be much over-damped near Q • 0, and in
fact for most values of 4 due to tiie shape of the induct anoe
curve.
Another related argument against the manner of getting
the arcsin is that the repponse to high frequency error functions
depends on the value of q. It seems to me that the treatment of
error functions should be independent of thet);arget speed -
- 28 -
what is best for one will be best for another - since the predictlo:
error we can tolerate is an absolute quantity, not dependent on the
target speed. There may be some objection to this argument on the
groundi that at higher target speeds the error funotion is apt to
be larger, and hence the circuit should have a larger impedance,
but even so it would only be accidental if the peculiar variation
introduced by the sinegear was anything like an approximation to
the desired variation.
Finally, a minor argument against the position of the
sine gear is that the equation becomes so difficult to handle
mathematically. A design of this type must be largely intuitive
or experimental - there is not much chance of ohoosing the con-
stants for the best operation by a mathematical formulation, or of
determining to speed of response etc analytically.
These difficulties might be avoided in several ways. The
arcsin might, for example, be introduced as in Figure 11.
No doubt the reason this was not done was because -with [ \{ near
1, running the sin x gear backward is not mechanically practical,
the gearing up ratio being too great. This objection could be
- 29 -
overcome in two ways - either a new gear K arcsin x to x (k large)
could be used and the parameters R, L, 3 all decreased by a factor
of k (or the integrator disks might be speeded up in suitable
ratios), or, if this were not mechanically feasible, a rapid re-
sponse servo mechanism could be introduced in the output, Figure 12.
This system, can, by the way, be solved in closed analytic form
when i is a constant, and reduced tc a quadrature in any case.
The essential feature of this circuit is that the functions of
rate finding and smoothing, and of taking the arcsin have oeen
isolated. ,ach part can be designed to do its own job the best
without comoromise. It may be noted that the arcsin circuit
aoove also performs a smoothing operation which depends on target
soeed. Sy suitable choice of the parameters we can make this
larr;e or small fs T.-e desire.
The ideal Hate Finder aaa Smoother
Let us consider the problem of rate finding and smooth-
ing from a general standoom^ and as* what mathematical opera-
tion a macnine snould perform to act as zhe "best possible* rate
finder. Cf course, rni s question has many answers, depending
chiefly on what assumptions we make as to the input function,
3'
- 30 -
and what mathematical limitations we put on the machine. Tile
shall assume throughout that the input function e(t) consists of
a series of linear parts with cunrea connecting portions and with
a small superimposed error function, and that we only desire the
rate during (that is, some time after the start of; a linear part.
In this section we assume there ar; no limitations whatever on the
machine - that we can build a machine tc perform any operations we
can ascribe, in particular those a mathematician might use tc
solve the problem. How there is considerable experimental and
theoretical justification to the t -eory that the best way to fit
a curve of a biven type tc a set of points subject to an observa-
tional error is in the least square sense. If we assume this tc
be true in our case, and attempt tc fit e straight line to the
last a seconds before tj of the curve e(tj, we must minimize the
integral
*l
I s e - (At-B) 2 dt
with respect to A and B. The quantity a represents the length of
the curve used in the fitting process, ne would like to use as
much of the curve as actually represents a linear segment to get the
best accuracy, but certainly no more. A person doing the curve
fitting could look at e(t) and see fairly well where the curve
showed a real tendency to depart from linearity, and select accor-
dingly. Mathematically it could be done as follows. Suppose the
31 V
-31-
standard deviation of the error is 6 and that errors of more than
say 4cr are almost certainly due to a significant departure from
linearity in the curve. We oould choose a such that it is as large
as possible without making the error I e-(At'B) | (A, B chosen to
minimize I) tj-a £r t ^ greater than 4<f. In other words we use
as muoh of the curve as we can assume linear within observational
errors. As a final refinement of the solution it might be desirable
to include a weighting function W(a.t) in the integral I, weighting
the more recent values more heavily. The final evaluation of the
rate is then the value of A given when we minimise the funotion
ftl
l(A,B.a) 8 re-(AttB) J2 *(t,a) dt
u t]_-a
on A and B, a fixed, giving A and B as functions of a, and then
cnoose a as large as possible with
| e - (At+B)| ± K C tx - aftf
This solution can be put into a more explicit form,
but even wnen greatly simplified it appears that it would be quite
difficult to carry out the calculations accurately by meohanioal
means. The main difficulty is that apparently such a machine must
be caoable of remembering exactly the past history of an arbitrary
function, e or something derived from it. The only methods I know
Of doing this are quite inaccurate, or else very complex, and it
seems likely that ^he gain in mathematical precision of the above
3%
- 32 -
formulation -would be more than offset by a loss in mechanical pre-
cision.
Differential Analyzer Types of Machines
Tc become a bit more practical, let us now confine our
attention to machines of what, might be called the differential
analyzer type. 3y this, vre mean machines constructed of a finite
combination of adders, integrators, and function elements (e.g.
non-circular gears). Two shafts e(t> and kt enter the machine
-
ana ore shaft u(t) leave b the macnine. It can be shown that any
such system must satisfy a dif f erect ial equation of the type
. • (n)
*(q.q ... q ,t) = e(t)
with
u(t) a qU).
First, we ask what can bo said about the form of this equation to
maJce the machine act as a satisfactory rate finder in our sense.
1. ..ith the same initial conditions and the same e(t) the
macnine snoula certainly resDond the same independent of
the Time of start, hence f does not depend on t.
2. .lien e = At B the equation must have an equilibrium solution
q^ ^ ■ A q(* ^) = o
(i-D
q = At e •
t i
i i
t
- 33 -
If i>l, the carriage of an integrator will be continuously moving
in the equilibrium condition. This does not seem practical for the
initial conditions may be anything depending on past history, and
the integrator would surely go off scale in many cases. Obviously
from the equilibrium solution, i is uot G, for this would icply a
constant equal to a linear function of time. Hence i = 1 and
q' = u(t).
3. Let
f U.y) s f (x,y,0, ... 0)
jue to the equilibrium solution
f (At -i- C, A) = At - 3
for all kt J, t.
it - jH*.y) A - A
it j s.
f (x,y) = X + h (y)
" tit
4. Assuming f is fairly "well behaved", we have near q » q = ...
■ q(n) ■ p (i.e. near equilibrium)
f ■ f (q, q, 0, C, ... , 0 )
q *q ^w
■ q h (q) * a2 q^ ... % q
34 -
and the differential operation depends on the coefficients
&2 ••• a^and h (q). As this differential operation should not
depend on t, the a^^ must be indepencent of q, for in equilibrium
q cnanges with t. Ihey may aepend on \ however in which case the
differential operation depends on the target speed, which may or
may not be desirable. In the deflection mechanism this is the
case, ag ■ 1
T-F"
5. iith q near a the above reduces to
f • q f q — a2q— ... — a_ q(fl)-~ b
where a^ ■ h» (a) and b - h(A}-Ah'(A). To eliminate backlash os-
cillation the roots cf this equation should all be real and for
stability all should be negative, for all desired A.
6. For complete stabil ty, there are no doubt further requirements
on the. form cf f. This problem, however, is still unsolved.
The above are only requirements on the form of f so that
it actually does find a satisfactory rate. To find the best form
of f would roquire u. very elaborate mathematical analysis if possible
at all. ■
If we restrict our machine still further and assume a
linear differential equation with cons-cant coefficients, it is
possible to ^ive a fairly rational analysis leading to the best
values of the coefficients. The question is this. Given the
equation
- 35 -
»0 q *i q' ••• »n q(n) ■ e
What values of the coefficients a0 ... a^ give the best rate-
finding smoothing properties? From what we said above, it seems
that the characteristic equation
-> *n P
should have only real negative roots and that the rate found will
be q'. We may normalize the equation by assuming a0 ■ 1 so that
q* is actually the rate and not merely proportional to it. In
the Heaviside symbolio notation, we have
q' =
-V(V 1)
writing the polynomial in the factored form. The b^ are positive
real numbers and are the time constants in the transient part of
the response. We assume the b, arranged in increasing magnitude.
Let us frsae the problem as follows. Keeping the speed
of response of the circuit the same, what values of the b give
the best attenuation of the error function. Of course, the trouble
appears in trying tc decide what we mean by keeping the speed of
response the same, ^'ne answer is that we keep the maximum time
constant, that is t_. the same. This may be partially justified
on the following grc«ndsi 1. For "almost all" initial conditions,
the term A e"-~ will eventually dominate the transient response,
24:
- oo
the other terms becoming arbitrarily small in comparison. The
only time when this fails is when the coefficient happens to
come out zero.
2. In the worst cases (other coefficients small in comparison)
the bn term dominates for all t, and the machine should perhaps be
designed with the worst conditions as governing.
3. If we use this criterion, it is easy to show that for best at-
tenuation of error frequencies all the b^ should be equal. For
the magnitude of the transfer admittance (e to q*) is
= li
2 2,
V (1- bk uj )
which is obviously smallest when each bk is made as large as
possible, for all frequencies. That is, each b^ ■ bn the maximum.
Another way the "same speed of response" might be in-
terpreted is in terms of the expected area under the transient
time curve. Keeping the standard deviation of this area con-
stant seems to give the same evaluation of the bk as above but
there are certain statistical assumptions in my proof that may
render it invalid.
If the characteristic equation has real roots, it may
be set up nicely as in Figure 13.
This circuit appears to have an advantage from the backlash
point of view over the more owvious one shown in Figure 14.
S 7 3s
, ^ver that the use of nonlinear equation.
It seems quite possible, however.
+otr« Consider the equation
could offer a real advantage.
S(q) q + Kfl> 4 S *
• *. are functions of When the system
where the three coefficxent. ere fu
< + acts approximately likex
i. at equilibrxum.it acts a. p
3(0) q 4- K0) q' - « " *
be adlusted to give critical aamp-
^ these three constat, could beadj
Man of the error function frequencies. On
ing and a good attenuatxon of tw
* at or near equilibrium, q. is
the other hand, when we are not at or
ki different from, tero. The values of the
(usually) considerably dxfferen*
(usually; w to g.ve a very
three coefficients could be adjust
, thuB .pproaoh the equilibrium posxtion faster,
rapid response, and thus appro
, v^ver that there is some fundamental error xn
It is possible, however, tnax
"w * .« attempt to do this would
- *„* for example, that an attempt w
this reasonxng, ror exwny
necessarily cause oscillation.
r irrJ-» j^SSS: ^cuits.
^T^T- — ... — - — - -
r
D3
Si
A HEIGHT DATA SMOOTHING iIECH/iHI3M
Claude J2. Shannon
5/S6/41
A HEIGHT DATA SMOOTHING UECHANISa
The so hematic diagram of a new type of height data
smoothing me onanism Is shown In /igure 1. The discontinuous
height data e(t) Is fed into the input shaft at intervals.
This drives a differential, oonneoted also to the ball car-
riage and roller of an Integrator whose disk is turned by a
constant speed motor. A correcting hand wheel and the inte-
grator roller feed another differential whose output is the
output of the device. The output and input of the machine are
compared through a differential feeding dial. The operator
is supposed to turn the handwheel In suoh a way that the posi-
tive and negative oscillations of the dial about zero are
equal.
The actual height of the target h(t) is a continuous
function of time and we may assume that Just after each read-
ing e(t) is an approximation to this* Thus h(t) and e(t) might
be as shown in Figure 2.
The shaft y(t) clearly satisfies the equation
(1) 7 ♦ £ 7* • «(t) .
The z shaft satisfies
(2) x(tJ - yit) ♦ olt)
and the dial roads
(3) D(t) - e(t) - xUi .
During the period between height readings the position of the
alt) shaft is constant, aay sit^), the reading TiaJcen at ta,
y *; y - 9<V
/ * » -a( t - 1_ ) <.
y - ett^ + ^ e * tn - t v tn + x
Since y is obviously continuous, it will follow a curve con-
sisting of a series of connected exponentials, each with the
same tine constant, 1 • The continuity of the ourre implies
- ^n 9 " * e< V •
assuming the intervals between readings the same, aay a seconds,
the response y for two different time constants m^a - In 2 and
aua « In 10 are snovm in Jlgure 3.
Hie larger the time constant, the acre the lag in
response of y(t), but the smoother the curve, Jhis may be
aeon another way: the o to y system is equivalent to an 3,
L circuit with position of 3hafts analogous to voltage as shown
In ifigure 4. with M small y follows e closely including the
a
irregularities, ./lth <g large y(t) is smooth compared to e but
lags considerably.
Movement of the hand wheel does not affeot y(t) but
shifts zltj up or down with respect to y. If the operator
turns the uheel to give equal positive and negative movements
of the dial, it may be seen that in the "steady state" (say
with f(t) - at) there is a constant lag even when the damping
is low and the interpolation nearly linear. In this case the
system bridges linearly between the raid-ordinates of the steps,
while actually it should bridge between the points ( tn ♦ 0}.
<ith higher damping the shape becomes worse but the interpolated
exponentials are nearer to the true curve most of the time. *e
3hall find a formula for the best time constant of the system
under the following assumptions
1. That the "best" time constant is the one making the
actual error least in the mean square sense.
2. That we may take as the true curve, so far as our
knowledge goes, the linear Interpolation between
the points tQ + 0. This may be justified by the
faot that the device cannot in any way perform
higher order interpolation - the curve y(t) is con-
vex upward whenever e(t) inoreased in its last step
over the final value of y from the preceding step,
and this is quite independent of the curvature of
a(t).
3. That the system is In a "steady state", that is,
that in the step under consideration y(t) ends at
the aajaa distance below e(t) as it was Just before
the step.
4. riiat the steps come at approximately equal inter-
vals or a seconds.
An interval under these conditions is shown in
Figure 5. Here we assumed that the hand wheel was turned to
give a ratio of -2_ as deflection of the dial just after to
just before a step.
.v'e have
-mt
y - A e
with
ylo) - b - y(a)
A - b • a e"
Hence
1 - e
b a~mt
7 "
also
l-e
s - y - y(o) +c
- 1 - <3"BA
- o — s— + c
-am
l-e
The Integral of the squared error per second is then
-2 1
- b
i -mt .
1 - e_aa a
dt
- 8 -
k u2 SJL- in * i e-^ !
1 - e
- 2
1 - e-D L2
1
a
k2 ♦
3 u^rs(1- ,+-t^j
+ k -
3 k L
1 - ^
1 - e~D)
D )
l-0-D [2 (D d£)
1
a
& ♦* ♦ 2 ♦ i (2 ♦ 4k) * D ♦ 3 + 5e'D
13 } 2 ^ ll--D)2 20 (1 . e-D)
It i3 evident from physical considerations that the minima of
this expression ooours fop a fairly large D. In faot the error
ourve was plotted for k - .5 (Figure 6) and the alnUBaa ia seen
to be at about 7 or 8. ,<ith D this large the abOTe expres-
sion ia very nearly equal to
- 7 -
sinoe e"D is very small. To locate the minimum we have
2* - jL - 2D (2 + 3k ) - 2 f ( 2 ♦ 4k ) 3 + 3] . Q
D2 D3 4 D2
16 - 8k) D - 16
8
whence
3 - 4k
7or k - •*
2
D - 8
Since the m**Hw«» is so flat (Figure 6) this formula is cer-
tainly close enough. However a second approximation may he
found as follows: for x small — - — - 1 + x. Using this in
1 - x
the exaot expression to eliminate the denominators we get as a
second approximation
2e'D)
- tl*k) U+e"D) - J5 llWD) - ± (l*e-3) e"3
J
- a -
£5 - 0 « - 8 ♦ (3- 4k) D + [6D (D*l) * 2D3 lk-1)] e~D+ 6D (D+l)
Using the first approximation to obtain the value s involving
exponentials, a better value may be obtained. Jor k - | the
second approximation ia D - 8.03. The first and second approxi-
mations are plotted in Figure 7.
tfith k - -| the ourve x<t) is plotted for an interval
with the "best" D, in Figure 8. It will be noted that the
ourve is highly damped in comparison to the time between read-
ings. The HIE error is then equal to
It is interesting to oompare this with the HIE errors obtained
under other conditions. If the devise is not used at all, but
a direct coupling made between the input and output, the HIE
error between the step function and the linear interpolation
between points tjj + 0 is
(I)2 . 1
CS) a
t 2
[0 - (- ^) ] dt
I m 1 m .577
b " y-sr " ' a
so that the RLE error has been reduced to 40$ of this value.
In Figure 9, the output of the smoothing mechanism,
x(t), is plotted for a certain forcing function e(t), using
the "best" value of m. It may appear that the output 1b still
far from 3000th, and this is in a sense true, but it must be
remembered that the variations in e(t) are here greatly ex-
aggerated over what would be expected in practice.
Finally it should be pointed out that a very mater-
ial improvement in operation could be obtained if the opera-
tor were trained to turn the handwneel to obtain a ratio 2
b
nearer to zero than This, however, would probably be im-
2
practical.
DIAL
< f »
C SM
C08R iCTl^O-
H AMX> WHEEL
C[0
t.
F.*t 2.
H I nmOM
DO
■
SOME EXPERIMENTAL RESULTS
OH TEE DEFLECTION MECHANISM
Claude E. Shannon
June 26, 1941
Some Experimental Results on the Deflection Mechanism
In a previous report, "A Study of the Deflection Mechanism and Some
Results on Rate Finders," a mathematical study mis made of a new type of
defleotion mechanism. The present paper is a further study of this de-
rice and a report on same experimental results obtained on the M.I.T.
differential analyser.
For oonvenienoe in reference, the schematic diagram of the machine
is repeated in Fig. 1. In the report mentioned, the utility of the
middle part of the device -was questioned. This arose from a misunder-
standing of the basic assumptions underlying the design and was oleared
up in a conference with Dr. Tappert. The writer's analysis was under
the assumption that the mechanism was designed to find rates for linear
forcing functions only (i.e., that higher order terms were small by com-
parison) , and the analysis is still valid if this is true. However, in
practice, it appears necessary to assume higher order forcing functions
and the deflection mechanism is designed to give the oorreot steady state
rate (exoept for the non-linearity of the sine gear) for an arbitrary
quadratio foroing function. Actually' the middle part (often referred to
hereafter as the "x" part) of the devioe is certainly well worth while,
as will be seen from some of our experimental curves.
If a linear mechanism has a transfer admittance T(ja) from input
e(t) to output 4(t) then
J" Q(J«>) - T(»E(juj)
where E and Q are the transforms of e and q. It is easily seen from
transform theory that if e(t) » at ♦ b, a necessary and sufficient condi-
tion that 4(t)->a a8 t-^>- is that
ǥ>-ȣ jo
If this condition is satisfied the system may be called a first order
rate finder — after the transient has died out, the output is the deriva-
tive of the input whenever latter is linear. Similarly if
00
T(O) - 0 Y'(O) - j T(0) - 0 k - 2, 5, ... , n
we have an nth order rata finder — in the steady state it finds the rate
of an nth degree polynomial forcing function. In the deflection mechanism
we have a second order rate finder
sj-
- + e^w3 + CgW* ♦ ...
if we assume / ■ nearly 1. A oircuit for solving
A ♦ 42
i - sin"1 4
under the same approximation, to the nth order is shown in Fig. 2. The
admittance here is approximately
1 # a1(» ♦ a2(»2 ♦ ... + Vl(j<u)n+1 ^
the values of the constants in the mechanism are
1 » 4.63 J"»
y(» x S **oa r * J"
1 ♦ 4.63 5.73 (j-r ♦ 1.094 (»S
_ (1 ♦ 4.63 .1«Qj«rf
In the previous report it was pointed out that due to a clutch and
stop on the input to the sine gear values of q" -were limited to two hori-
zontal lines (see Pig. 6 in that report). There is also a olutoh and
stop on the displacement of the lower integrator. This effectively fur-
ther limits solutions to a parallelogram ai shown in Pig. 3. Actually
the limitation is fictitious — the q shaft oan turn an unlimited amount,
but when this stop is in effect the stability point moves at such a speed
as to be equivalent to q and \ moving along one side of the parallelogram.
Thus if we keep the stable point stationary paths of representative solu-
tions will be as indioated in Pig. 3.
The trial solutions taken on the differential analyser may be classi-
fied as follows «
2
I. Solutions taken -with the mechanism as designed.
A. 8imple analytic forcing functions.
1. e(t) - a
2. e(t) ■ at t b
3. e(t) » at ♦ Vt ♦ o
4. e(t) - at3 + fct2 + ot ♦ d
B. Response for 8 -typical target courses, the target vector
Telocity constant.
C. The response to some error functions superposed on typical
courses.
D. An attempt to get backlash oscillation.
II. Approximately the come program although less extensively with the
middle part eliminated*
III. A few runs with typioal courses using three different third order
rate finders.
The constants of the target courses used nere as follows (see Fig. 4) i
Course I S - 150 yds/seo » 507 mi/hr
O
7 « 2,000 yds
h^ - 1,000 yds
$ m 0°
Course II 8 • 150 yds/seo
g
2,000 yd.
h^ - 500 yds
* "0
Course III 8 - 150 yds/seo
8
V - 4,000 yds
ha • 1,000 yds
• - 0
3
Course IT
S - 150
V - 2,000
h - 2,000
in
0 - 0
Course Y
Course VI
S - 150
S
V - 4,000
in
h - 4,000
in
9 - - 14.96°
V - 4,000 - 40 t
S„ - 150
V - 2,000
m
h - 1-000
M
* - - 14.96°
V - 2,000 - 40 t
Course VII
B - 96.6
e
V - 3,000
hn - 1.000
6 - - 60°
V - 3,000 - 115 t
Course VIII 8-150
g
V - 4,000
hm - 500
• • 0
The distribution of these courses is indicated in Fig. 5, together
with the approximate maximum range of the 3B A. A, gun (21 sec. fuse setting).
The actual input to the deflection meohanism is
r* s h t
a o p
but since it was desired to compare the actual output with the true
deflection
sin"1 i
the quantity e was plotted against t and integrated to provide the input.
To calculate I the following method was found to be the simplest. We have
8 h t
' --P **-
o p
A computation schedule was set up based on this formula, working baok-
wards from the time of burst t + t to the present time
P
I II III
(assumed)
t ♦ t h V
P P p
" h/l*£8g(t*tp)J2 - yi- (ftp)Sgtan *]
IV T VI VII
*p t / 78— IT
from - I - TV
ballistic
curves
The ballistic data used in getting t (IV) was read from the chart
Fig, 24 Opposite p. 59), Coast Artillery Field Manual, FM 4-110. The
value of tp was merely read off corresponding to the computed values of
r and h .
P P
If we assume as an approximation that the shell velocity is oonstant,
k yds/seo (i.e., that the equi-time of flight curves in the ohart are
circles) so that with V constant
, 2.2 .2 „2
k t « h + V
P P
h - h + S (t+t ) '
p m gv p'
p m
h/h" ♦ S t2
we oan eliminate tp and hp from the system to obtain the following equation
between e and tt
o
e2[k2(hm*Sgt)2(h^2)- (h2*S^)V2S2]
+ *[2 vsWhfVTt2] - C^5T2*TT2(h *ts )2] - o
g m n g ' 1 g m g m* m g'J
Evidently the same curve a (t) is obtained if h and S are both multi-
o m g
plied by the same constant.
The differential analyeer set-up used is shown in Pig. 6. An attempt
was made to generate the sine function with two integrators solving
but this was found impractical because of the large integrator loading
necessary, and an input table was used instead. Even in this case it was
necessary to use a very large scale factor on the independent variable
shaft due to the small integrating factors (l/S2) of the differential
analyzer as nompared to the ball type (about 1 under comparable condi-
tions). ,This resulted in solutions which represented, actually, 30 sec-
onds requiring 30 minutes of maohine time.
The equations of the deflection mechanism are
9 i * .54 x - .54 |
♦ 4.700 q ♦ 1.692 q - 1.692 e ♦ 4.700 x
1 1-4
It was neoessary to approximate the ooeffioients with available gear
ratios on the differential analyrer. Fortunately some very close approxi-
mations were found. The equations actually set on the machine were
6
7t?
* ♦ .54 :X - .54 i
♦ 4.706 $ ♦ 1.694 q - 1.694 e + 4.706 x
The error is of the sane order as the expected machine error.
Except for runs In group ID the. machine was made as "tight" as pos-
sible, the backlash being corrected by frontlash units. Due to the large
scale factors used and the high inherent precision of the integrators used
in the differential analyeer, the rune ray be expected to be more accurate
than the actual deflection mechanism.
Solutions were taken in the form of both curves and counter readings.
The ourves given here -were reproduced by pantograph to ordinary graph
paper size. Curves not directly drawn by the machine and numerioal values
quoted are taken from the counter printings, which give an additional
decimal plaoe not readable from the ourves.
Discussion of Runs
Host of the curves are given with 4 as dependent variable. To esti-
mate the error in yards for a given error in q from e, the ohart of Fig, 6A
may be used. This is computed from the approximate formula
r cos t IS
. r££L* Aq - r A(e,q) Aq
/l-F
For rough comparisons the coefficient A may be taken as 1, the error then
being the 4 error multiplied by the predicted range.
The first set of runs taken were with a sudden impulse e - kl with
the system at rest, both with and without the middle part of the meohanism.
Runs were taken with
k - 0.1, 0.2, 0.4, 1.0, 2.0
Typloal curves are shown in Figs. 7 and 8. The results are very close to
computed ourves on the assumption that l/f/l*^ ■ 1 when k < .4, but above
this the non-linearity becomes appreciable. In the worst cases the
sient disappeared to within machine errors in 25 seconds, and for most
oases within 8 to 12 seconds. The action with the middle part out was
7
considerably more rapid than -with it in, the transient being 6 tines as
great, as had been predicted, this being a special case of a linear
forcing function. Pig. 9 is a -lot of the time required for the transient
in 4 to reduce to 2/10 of its maximum value. For values of k greater
than about .35 the curves cross the axis once with the middle part in.
The curves with it out are all" identical with k > 2, due to the action
of the slip clutch on one integrator.
-
Next a series of runs were taken
e - ktl(t)
starting from rest, with
sin""T: - steady state S - 15°, 30°, 45°, 60°, 75°, 60. G°
the last being the limit of the sine gear, the maximum possible deflection.
These runs are shown in Figs. 10 and 11. The transient died out in all
cases within 20 seconds except with x in for S > 75° in which oases 30
seoonds or more was required, due to the action of the slip clutch. These
long transients, however, would probably not be troublesome since such
large deflections would only ocour in practice with the plane almost di-
rectly overhead. For the smaller values the response is about equally
rapid with x in or out.
Quadratl o Forolng Functions
— — — — 1
The runs with a quadratic forcing function
e - at2
were the first to show the superiority of the mechanism with x in. Runs
were taken with
a - .01, .02, .03, .04, .10
With a quadratic rate finder the solution q" should approach 2 at, and with
x in this was very nearly true, the discrepancy being due to the sine gear.
8ome solutions are shown in Figs. 12, 13, and 14. The errors increase with
a and with \. The maximum slope found in air/ of the I courses plotted is
about equivalent to an a of .05 so that the large errors due to the sine-
gear with a - .10 need not cause great concern.
8
Cubio Forcing Functl ong
For oubic forcing functions the following were used
•± - -.04 t3 ♦ .1 t2
e2 - -.001 t3 ♦ ,05 t2
e3 - -.0002 t3 ♦ .02 t2
.These -were chosen as having second order tangenoy at t - 0 so that the
transient is small. The results are shown in Figs. 15 and 16. The re-
sponse with e2 and especially e3 are very olose to the calculated values
on assuming the equation linear. The error in e^ is somewhat greater as
in the quadratic case with higher acceleration.
Effect of Backlash
— — — — '
A number of runs were made to determine the effect of backlash using
several different foroing functions. In order to inorease the amount of
backlash, frontlash units were inserted at several oritioal points in the
baokwards direction. The results of these runs were, however, oompletely
negative, for no oscillation of any sort was discovered. The system was
given "shocks" by sudden turning of the e shaft and other methods, but the
solutions were oompletely stable The only results were small consistent
errors, of the order of magnitude of the backlash. It is possible that
due to the large soale factors used in the set up, even the artifiofelly
introduced baoklash was not sufficient to oause the oseillatlon effect.
Response for Typical Courses
The response for the 8 oourses described above are shown in Figs. 17
to 24. It may be noted that even on the flat oourses (e.g., IV) the opera-
tion is poor without x. On the flat oourses the response is satisfactory
with x, the error being less than 20 yards except sometimes at the hump in
e. However for the steeper courses errors of 60 or more yards are common
after the start of the peak which do not disappear until nearly the end of
the oourse. The action is particularly bad coming down the hump. Fig. 25
is a plot of the error in yards with oourse VIII, x in.
9
Response to Error Functions
In Pigs. 26 - 28 are shown the responses to some random error func-
tions of various kinds superimposed on courses I and II. The operation
in damping out the error is considerably better with x out. However it
seems from a consideration of the size of the errors introduced and the
responses found that the system, even with x in, damps the errors more
than necessary. That is, it might be preferable to increase the speed of
response so as to reduce the transient errors in the solutions.
Pigs. 29 and 30 show the responses when we suddenly start tracking a
target in courses I or II with the machine previously at rest, with the
target at several points along the course.
Tests with Different Equations
Three runs were made on course VIII, the most difficult one of the :
group, using three different cubic rate finding equations. The equations
used were (assuming linearity) critically damped, with the transfer
admittance st
[i ♦ 2(>)r
2
(2) 4 . 1 * 4(j«fr ♦ 6(J.)
[i ♦ (J-)]4
The results of these runs are shown in Pigs. 31, 32, and 33 and
should be compared with Pig. 24. Of oourse, this gain is accompanied with .
a loss in error function damping. With the^roots equal to 2 the system
had a slight tendency to be unstable on the flat part of the oourse. This
however appeared to be due to the "human backlash" in the operator on the
sine table and would probably not be present with a sine gear.
It is easily seen that an increase in the values of the characteristic
roots of the equation demands a proportional increase in the power require-
ments of the integrators. It may be that this will be a design limit in
the case of meohanioal systems. Ho difficulty would be experienced here
however with electrical integrators.
10
The main conclusions of this work are as follows:
1. The middle part of the machine is definitely worth while.
Although it increases response for accidental following errors, the gain
in behavior for actual courses more than offsets this disadvantage.
2. The system behaves nearly enough like the linear system
1.094 "q ♦ 5.73 q ♦ 4.63 q ♦ q - 4.63 I * 4.63 e
to within a few per cent,
ction of 37°, the approxi-
that this may be used to calculate its
providing q < .6. As this corresponds to a
mation is sufficient for most eases.
3. For targets whose elevation at their nearest point is greater than
about 50° fairly large errors occur due to substantial cubic and higher
degree terms in e. This indioates that it might be worth while to use a
higher order rate finder. Tests made with a oubio rate finder showed
greatly improved results.
4. If the additional cost of another integrator and adder required
for cubic rate finding iB too great to be Justified it appears that the
system oould be improved by reduoing the time constants, for if sufficient
power is available from the integrators, the only disadvantage would be
increased response to random error functions and our results indioate that
they are now damped out more than neoessary.
5. There is some indioation that better results would be obtained
by making the three time constants equal, or more nearly equal than they
are now, although this is not certain.
11
mr— < mum, tmmm l-.-jgni —
inS^^B^^ESS — — %5S55 immmm tw
■■■■■■■■■■■■■I
S3
IMBttS HIMlUHmMUMilMN
wmmmwmmmmmwmmmmmmmm^wmmmmmmmm
mmmmmmmwmmmmmmmmmmsr
□
^H^^^ igOiffililllfin imlUlIl iOtliiinflmiiiii iioio|i| Illy gnl gm^
■■■■■HHi
•IZI !!*••&•»■«
■IM ««••■■■••■ ••««■••••• •■•■■•••«*
■apt •«»••■■•■• aMsavaaas
mmmt Imu Man Miii mMini
iaaf »fj»8 ■ ■
IIIUilMUMIt*
— -■■■■»«
!!■■■*
■■■■■i
iftai iMNMIitMin
ilOasHS:
aaaaaaaaiiai
aa uuiiiiii
^^JiiiliiliillliiiHli
BBSS
»SUua
IIIIIMM
itS"
SSSli
iig^iiiiiffliiliElili^IBt^lili
piiipipillliiiiiPiill!
••■•it
jyyjlgHOjnllL
MSMMMMmiMNffMNNIflMI
MiZa 55555 iitH am M"j
■ESS ScSS Bwn mvm nvuvv
toHBS Sasui :::::::=: 2K:r
p^^g|gliPpillipigii
sasBS.,
laMiyllillRSiyiio
■■■HMIBH|iliHi;s:
HHiHiniiiHHH
liiiHtan.!*' ■ tmmmf »«««»»««»»»«««« »«»»» lllli HIS ■«»» ■»■»« *** Sii f?=T=-—
i--^— :rt~;::
••■"■■•••■■•B.BBII.IIIIBB.II. ■■■■■■■■■■•I* Jl I • • ■ .
::::: ■■■
!!!!ai1111 Iaaai 1 Hiaa>l »■•!•■■■* ■■■■■ " hi "!! !
■■•■■■■■•■•■■■a
.IB. ..III! ai'BIBII
BBBaiBIBBIII ■■■■**
■■■■■■■■■■■■■■■■■a it ■■■■<- -
■*■■■■■■ wmw*
••••• urn • •••2222222 21222 222*. 2222! 22"..
■ ■»" bbim Miiaiiaisami. ■■■■■■■■■■ ■■■■■■■«■■
«•«■•••«• riiniiiifMiiniMiii *iimiim«(IimimimmSm!!!m
:::::::::::::::::::::::::
:::::::::::::::::::::::::i:
■■■■■ ■■■■■■■■■■ ama ■ ■•■« ■
• •■•-■■■> awi aauiiMMiMaa ibm. .£2 ZZ 22222
bbiibbbbm ibbbb um imi mn ■■■■■■■■■■ •■■«..■■■..«■.. .... ■
•••.■•••••••■•.•■.••.•••■.•••.a.... ■•••■•■••« ■....«■!■• !•»•••■!•*
iiiiiiiniii um miiiiiiiimi iMtiiiiiiifniiiiifMiiiiii
lUMluniMttilu ••■iiiiiMtiiinnimni ...
■■■■■■■■■■iMMiiiuiimiHiiiuii ..........
■»■■■ iiniitMiiniiiwiiMmMHimmmiimimiiHiMmnnmiiin
...ii ■.. ■■■»..•■•• mi inn in
Mill ■■■■■ M1M ■.M.^.-.W W _ ^lOTHMIUaa. •••■■■»■• BUM ■■•■■■■•■■..«..
■•••■•••■1 III
'■■■■■■■■■■■■■•.I ...
■*;■?«■■•■■■•» --«■■■
■ .in...... bk mmm
■iiiaa
•imi
• ■■■■t ■■■■*■■ BBBasiiiL-t
:::::::::•::::::
'\:::::::::::::u:::::::::::i::::^:::::::::::::::::::::::::::::: ::::::
..i**;;" -•»••»•■■•••»■•■•••• ••••••
■■ miiin
222222222! !22*i~r*** "t- ..»•».■•.•■••..■•.••
II2I! II22! ?*;■! f^£i. ■■•■■■■■•■•■•■■••■■■ *«b •* . • ■■• ■.*•..■•«■
■■■■■■MB.
PBBM ■•■»* «..■■ BIBMBBBBI IBBBB IB
■ .BBSS
■■■*■■
■■■■■ •■■■■■■■■a SSSSSSi
22222:22222222s ■••■•■■•"•^^•'^■« •■-■«■••■■■■■••••••■
2222! 22222 2222! 12222 252" Ik^-lkMIIUIIIIIIIIIIIIIiiiuUIIIIIIIIHIIIill
2222! 222222222! 22222 !22±f i-iisis^*"* -«m..i»..... .......
!222!!!22!!222!2222!2222! £2222 222^* --•*»*■ ■•■«■••■■.
■ ■•«■•.. ...a... -is. ii
2222 2222222222 2222222222 !222!2M2!*^*:
■■■■■u inrni ■■■■
bbm bbbbb ... bbbbb .ihii ami .. ... .... . .u
llll»IHIIIIIIIIIIII|l|llllllllHUIIIIUIIflllllllllll
HiiiiHiiiiiimii iitttiiiiu iiiiimitiia ii iiiiiii
iMiiiiii •■■■•aaaaa aaiiaii
■■■■ BIIBB milMHI 1.M1 ■■■■■ IIIIB milHII|«M||||
■yi ■■■■■ itwi mw ■■■■■ ■■■■■■■■■■ ■■iiiiwi ■ win
■■■imii aaaaa.aaa. iiihiiiiii»iiiiiii ■■■■■•■•■i ainaai-
■Mf IHII ■■■ ■■■■■■■ Hniiiin ■■■■■ mil ■■■■■■■■■■ immiii
■■■•■••••■■•■••■■••■•■■•■■MtlmiNMI ■«•■..«.■.
IZ1I22 222222222! ••■■»•■•■■•■■■•■•
222? 222SS2222! 22*2* ^■■■■^■■■■■■•■■■■■■■•■■■■■■■■■••.>aB»aaaaiiiaaaa.a...
222S 22222 22222 22222 ^2 * .■»■. n. umii—
■ BUM I
ass:
• ••Si aaaiaasaa. miiitiii
■■■■■iiniiuMMUiMm
2222 2222222222 22222 25222 22222 SlSnSSSSS IZZZZ ZVZ* ZZZmZZmZZZ
lliilzlilllll^
lljMIIIIII MUlllllllttMMI
■■■■■■■■■a ■■■■■ aa
8B858g— ■ ■■■■■mwWMHiii— ■■■■■■ ■■■iiiiaia
N flllMIIIIIIIMIMMIIIU Mlllllllil m
miiim 2*J22 22 212 .2222 22222" 2
! 222222 2222222222 22222222i!**"*"w*
2222 22222 222222222222222 222222222! •
;a;i!»;*;!!a*!;?!M'g!f!jiM*»»i*MiiiittitiiitM»*iiiiitiiiMM«iiMMii»
222! ?'"*"aai"Vaa*"a***l*MW,m'l>'>l iiiMHin— niiaii
■ BvaaaaBiaa aaaieaaaa
■ aaama mim t . mmu
■ I ••tlMIIIIIIIIIII | .III..:
mm ■■■■■■■■■■ tun
■ WlMBil
its:
Hiiiuimi iiNiiiaiiiiu ■■««■■■■»■ ESaaaBaaaiiiaiiii
iiiuh mn inii Hiniiui inmiHi ■■■■■■■■■■■■■■■t
■ HIIIIIMiH
'•»« IIIIIIIIHIIIHI
■■in inn aaaai mhjiiiiim .......... ..
• aaa miiiiiHiiini.
■ ■■■•■■■■■■■■I IIIHU
■■■■■■■■aiiiaaiiii
■■■a ■■■■■■■■■a Bin
■■■a ihii bbim iibj
■ ■a. iiiiinmiimiaMB mmm\
miiHin ■■■
■•■Niiinmu mniiMi
•«•■■ aia.iaaava iiiibmim a .bib
*■■*■•«■■■ ■■■■■•■»l ■•■■IIMHiaHIIIIIII Hilllllll
■ibbb uaifl »■»•■ ■ a. umimiiiiimni imiiiiiii
■mm aaan mm mmm ■■■■■
bb aaa ■ aaa ■ ai IM ima
iMiiniiiiiHimiinniimmiiiiiiiiimmiiuiu ■■■■iimhhimihii
2222 222222222! :22K22*"*2"""**"*""*"""*"**"^
222! 12222222m 22222 22212 221222222222222 2222221222 2222222S22 **;'****;;;
hi m
I Embbmm (SiiiHiiiMHiiinj
1!
iiU
llllsiiltalll
br:
iigii§^iiiii!yoriHniiiiiLiiyyiii§iii|
sh s=s nca sr ■• rr: xsa ssisn rrsrrs rr: b? r brsrrs •?•■? ■•:■! am k:k tnsRSSiaRSBif
IIrrrr:
rrs nss rt^ r r: xsa rrs » rr rrs rrj rr: Br: £r::::rr:::r:r:r::r:::
rr: ass b^j r= rr: rr: rrrrr r:r :rr R«. • • . « .«•• •••** ■•«• krcrrj
lliisfeltil
rrjbrsrrirr:
RRIRRtB^ BTWRR3 BR3RR3<RRB:a BR;nm ^
■••••if
HRSIBR3
IE
jmw^mmmmwwmmwmmm
mmmmmmmmmmmmmmmwAM
:rrr
ISS!
'rr:
S!BB
BR
iitSS:
rrb:
srUIIIH
hrJI
m
it***
iJliib
HHHHH
krr:
:brk
■fiuilliiiiiiuiflH«iiia»ift*fliMM*ai*w« I
■ ■iHiiiiiiiifimMiffumiiiuuMtMiM I
B"~" !HBM,HM'!lMW*>M*"l"t>w"Mwi :
"!!l!''*!"*,,>"!!mui*>*,'!*MPia> I
■uiatMBMUiflisiMUiiMMiiBMaMiaai I
— MM**!
mill
lilt.
■«■■««»
_ IUIIUI.. .
SKKSS
::::: :k:sj:
HSU
ksbkj
■ »*■■«*■■ mm a asm ■
■ Himwii >Maa»
tSBaei:
~ MM WWW
assi
alaaat^jaaaaa
[■■■■■■■■■■■■■^■■mw
|i«ntifiiiii>r .■•■■■■•■■■■■■■■■■••t ■»•■•**»••
I »yMMiiiiniini»iiu
I ■miuiuiiir|»VHU mUHmuiai aaaaM
::::::::::::t»::w.:: :::::::E:::::u:Rn:::::nn:K:::us:i
I r ::::::::::::::;::::!::::::;:::::::; :uk:
■ MIS UU
■ M»*« ••■>•■
aasasi
=7 — -^aVBltflt'M §? Aai —
;;:;t;::;;ntaiie;h;t;«:»«aw!i
• b ha ■ * »
■•■■■•■•■■■■■■■•■■■■•■a
liiiiiiiiiuiiitiiiiiii
"IIIHI»lltlllU«IIHIIinMllinHlllMaHHMIIffllllll*»l>ill KllKll
■IIIIIMIIMIIMItllHIM*
lllllllll"lllllllMIIIIIIIIIIIIIIIIIRIIII(lU»llt*llllt.««M<«lll*f»<lllMI«lllll*
■ •«>(■■-«•••. ■■■■«■•■> «>■■■■•■■«•■••■>■«■■■■>«■ ■■■■ ......... .*•■..•«.»•.». ......
lHi>li*i«iiUMiiniiiH«iiiiiiiiitiiiiiniiHii«iiiiiiii« >it(Muii«i«»l.aii(i«if»iiiMiimiiii>'
!■■■ ££!5!5ffff> SfSSSflSSfHSS!! ■•■55 ■•••■••••••••ft *■*•!*»»•••••« »••>»*•*
■ •■■«
« » « • « ■* •
■ ■*»» » ">*•••■■*«>■■■• *ar *■ ■>•..••■■
* * t«. *»»•••,■*
..«* -t L. * IlltllXI
MKftiiiitiiiiiriiiitii . »« *;--•>•«•««.
■ »•* ■■■■■■»*•#««» IIIIIIIi^MI: •l]l..|t|l«'c||i|tt||M|t,
■*»*■••■•**»»■••••§•*«■• * «•■><■»•.•*•*■■-
iilliiiininfiiiiiiiiiiiiiiiii«iiiiiiiiiimHiiai(iifi«ii*n i«iiii«HucMiiiiM«<iiiiMi«uiii>
lUllllllllillllllll ■■■»■■■■■>•■■■■■•■•■ ■■■■•«■■•■•••»•■»>•' •«f ••••■••••■*« -AM. 3 Ittirtlllll.tUtfl .
IIUIIIIIIIIItlllllllllllilllMinil^KIUMUIKIIMIDfltMMUMMOt.l.^Mt.l.lllltlllWtllX- s <
>«i>> ■>■*>>>• >•■>••»>..>•-■•■•*«•-•■•■•..••••••■■.■>•*•. a^.x ...,....««.«•.». .».» ...... ........ ...
■Mil**"
iuiiiii
>*»■>■••..• i «»«. * ««.
I ■■■•«■■■•■ ■«■••■■■■■■■■■■*»■■■»■■■■■■■« «•■■■»■■■ IBS'
■■«••■•■ •
■ ■>■■■■»)■«■■«■ ■itiillilit aa*Ba
ilimifiuiimii
■ ■■■
Hi
_. JBBBMBB *■*■■■•* « a «
i *•>«•■••«■■■•* ■■Miiiari
■* MlilllllliflMMIt
■ »*■ iffititiiiiaitimtiii •■iiaidiintKiiiiiii ■
■•■■•■■»•■ i»iiiiijfi>iiiiiiiiiiimm«iiitiiiitt< I
■•■■■■«••» ' ■•■•■IIIIIIIIIiaillllUIIIIMIIllllllll itMiMtim (laiiMlffiliitKilikM) I
_ JniMIMimilHIIUItlflimiltlMRRIIt'iiakitlflltMll.fflllMIHMIimif
• mm ■ «f k . r ........ .) r>(
• •■laiMaiMii'i j - v-j
lltlNMIKKIMtiU '
Criteria for CcnaUtecoy and uniquenee* la R«lay circuit!
[>}
September ft, 1M1
Zb ft ayatea of linear algebraic equation*, thara
ara tfcree poaaibla type* of de«eu*rnoy, n&aely lneonaiateaey
(no poaaibla aolntioa), assblguity (solution* not uniquely
determined) and redundancy (aura equation* than neeeeaarr) •
Scoe**ary and auffioiont condition* ara known for the a*
types of degeneracy in tcra* of the rank* of mm coefficient
and augmented satrioea. Soaewfcat elailar af facta can occur
in tna boolean equation* characterising relay oircuita, gir»
ins riaa respectively to chattering aaoiguity of relay pool-
tioa for certain value of the independent variable a, and reduad-
^UaVCJJ^ ^Je?^ HJ^avdsVj^JJ ^^e^? ^M9&aat'^^^aV^jtfca^^ 3ha^fc ^*1b^*J**^J e^H^c*1^*^ Jpas\J?^fce^ca^^n> ^H^^L^fc^Ht^^LJfc ^cTiij^^^a,
W« aattM i aihmA fjM» thft«> mnnA I tlrtna Im t— mm f»f a a ilMKltt
ae^a? ^s*es> ^*^acaa»>ea>*aaa^pa» *> wcT Waaler i*^i*» ^p^peiwn ek vavatv aa^ai w^ses, a* ^e^w^a w
dlacrlainant 7.
Consider a relay circuit containing •** relay a
*X> «gf •••• Hake and break a oat cot a oa ^ are dealg-
aated aA aad *J, and we auppoca that thara are a independent
variable a1, e^, •»•, e^, which do not depend oa the relay
poaitlona. 0uah a circuit la equivalent to the circuit of
Fi*. 1 in which
*i *B* **** *** *i» *#,• •••
la the Boolean function which la aero when the awitchee
*»ft MitMti a^, ere la eucfc position* that the volt-
«M wro» la the original circuit la *uf r icloot to oper-
ete It ana oh otherwise. The fenetloa
B
i-x
will be •till* the oirauit ai«cri*ta*nt. *e alee define the
following it mm* a eteadr etate la a relay circuit corres-
ponding to a given aat of veluee of the laaepeaaoat variables
Ais a act of poaltloaa P.. ?«. JLrtao
relaye oath that If tao iadepeodeat variabice ere given
tao valuee A^, end tao ralaye held la tao position
Tt> ««»• Pa lea* enough for tao eteadr atato fluxee la tao
00U0 to build *», the relays will remain la tao aaao poal-
tloaa ladefinBtely,
a oeapletelr •oolUatoay oteto at a relay elreult
la a aot of valaoa Mg% A,, „#f of the independent variables,
each that ao natter what tao Initial yoaltloae of tao relays,
or how long they are held la that position, ansa they ara re-
leesed at least oao aakeo aa laflalto auaeer of eeoUlatloas,
I.e. ehattare. Xa addition to theee obviously exclusive pocei-
hUitles a alrealt nay be •partially* oscillatory for eertela
Y*lu*i of th« loft«j>emaoftt rarioblos- with mm iaitUl oonCi
tiooo th« •Ircuit oh&tt«r* and with otters roiftpooo ioto o
•toot? ototo. Ao oxonpla U oho** im Figure a wtero with
too ioltiol OOO&MOO
ax • 0 (o9»i»to4)
tho oireuit «h*ttero while with
tho oireuit rei&peee into tte eteefijr ototo • 1, Rg * 1
fttSBBI I • *°* *i§ *••* *£• *M t* »e o otooA/
ototo It is oeeeoeerjr eoft ouffloleot toot
This lo aeoeoeejy eiooe lo o otoo^jr ototo too oeotooto of
■ ■
relay «1o41o#i
or
%•.%•»
to toot
o-ai^ol^-t «*eo • Wv mt • A^
Xt la sufficient sines
so tt*t if tii* relays are hsld is these positions ?A long
enough fear fluxes to build up they will remain there*
■
Theorem II • For .... to be completely oscillatory
it is necessary end sufficient that
t C*^t a^i «^» •••• a^) • l
identically la the This la accessary sines other-
wiss there Is a sst of a^, say 9^ such that * * 0 and
this Is a steady stats by Theorsm X, It la sufficient
alas* If true thsa with any starting position say
9V •»*, Fa at least one tern of ths sua (1) say *t • n^
la equal to one. aa that
snd one or ths other ana to • hence, After sons relay has
shangsa «a still boys ths sans aitaatloa sines f - 1 so
that at lsaat one relay ashes aa infinite number af shannon
of position*
- 5 -
la »tM f Ui# A^t #♦♦» a^) is * function
•f tfat (ait idontioalir ©at or n«ro) too oyste* h»»
•om nt«aay »tata« aawoly tat roots of f « 0, Out for
arbitrary starting conditions w* saenot toy what the notion
will so, Khataer s elroalt eeefce out s steady state or sot
depends set only on ths artwork topologr so la Fig, 2» oat
•loo oa relay ehareoteristise as la Fig. 3. Bare If lo
olow operating ana *j wy fast the « iron it oar chatter
with both relays ialtieUy uaeps rated for ag nay new
stay la long eaoasfe to opsrsto K^. If lo fast and
Sg alow release* too systea rolapooo lata *x * 0, Rg • 1.
Boaoo no purely slgsbrais oo editions saa So sot ap to deter-
alao whether a olroait will rolapao lata a stood? otota whoa
0 la a function of s^t
© ojk ^fts^ eiSKe^sKJo^SPf
!
SvlIj 15, 1943
Gap? Ko
Bel
ON THE INTEGRATION OF TEE
BALLISTIC EQUATIONS ON THE ABERDEEN ANALYZER
by
Professor W, Feller of ErovzD. University and
Dp, 0» E» Shannon of the Bell Telephone Laboratories
AMP REPORT NO. 28.1
APPLIED MATHEMATICS PANEL
NATIONAL DEFENSE RESEARCE COMMITTEE
This is a report on Investigations made at the request
of Dp. Warren Weaver (letter of December 28, 1942). Our study
has been based partly on oral information received in Aberdeen
(January 18, 1942) and partly on the material contained in the
Report No. 319 of the Ballistic Research Laboratory ("Report
on the Differential Analyzer at Aberdeen Proving Ground" by
Major A. A. Bennett, December 1942). The technical set-up
as described in that report will in the sequel be referred to
as "present set-up". It should be clearly understood that we
were not to study possible technical improvements of the ana-
lyzer as such nor to reexamine the theory underlying the dif-
ferential equations. Accordingly, the present report is con-
cerned only with an examination of the procedure of mechanical
integration of the differential equations of ballistics as
used at present. Furthermore, we have not considered any methods
of integration other than on the differential analyzer.
Before proceeding to describe devices which might
contribute to the efficiency of the analyser we wish to summarize
some negative findings, as these may render superfluous similar
investigations by other persons.
a) We have carefully investigated a great number of
alternative set-ups, on the differential analyzer, of the dif-
ferential equations either in their present form or using
various new variables. However, we have been unable to find
any form superior to the method as used at present in Aberdeen
which, in our opinion, is the most efficient one.
b) We have studied the advisability of using some
method of successive approximations. Such methods naturally
present themselves since one should expect them to reduce the
ranges of the variables involved and thus increase the accuracy o
However, a closer study will show that it is almost invariably
necessary to subtract, on the analyzer, two large quantities
which are themselves independently obtained on the analyser.
This, of course, nullifies the desired effect of reducing the
ranges. Various possibilities have been studied and, among
fchesn, the possibility of starting with the vacuum trajectories
and integrating the difference between them and the actual
trajectories. Again we were unable to find a method which
would aopear superior to the present set-up. It will be noted,
however, that the modification of the latter suggested below,
can in some sense be interpreted as the first step in method
of successive approximations.
c) Several perturbation methods and expansions
according to various parameters have been tried paying special
attention to methods suggested in the newest Russian literature .
None of these methods seem appropriate for the analyzer «
Coming to the less negative part of this report we
remark that an adequate theory of errors of the differential
analyzer is not available at present. However, simple theoretical
considerations based on experience gathered at M.I.T. make it
appear that a very considerable part of the total error is due
iEITIDOTEl
of error are backlash and,, perhaps even bo?®, inaccuracies in
the following meehenism for- the input and vector tables . It
ssems therefore possible to achieve a gain in accuracy by P®«
dueing the range o£' the variable?? in the integrators, even
though this nay neeossitat© the introduction of new adders
and gears. $hs following r ecomsaendat ions are based on this
assusaptiO'At We proceed* step by step starting with the simplest
case.
Recomend&tions ,
1) Consider, to begin with, the horizontal displace-
to
s
sent 2:. Obviously dx/dt will range from its maximum r, at
the beginning to seine fraction of it, say qxQ, at the end*
Accordingly, when integrating in the usual form
(1) X * X dt
the integrand ranges from qzc to xQ , Now this means that
only a fraction 1 " -3 — of the total range of the integrator
disc is used even if we suppose that the goale factor has been
chosen in the best way (30 that the rim of the integrator disc
is used for values of x near x0). If, instead, we
14J_ i f * 1 . <l
(2) x - — g r xot « j(z . i-| a^Jdt ,
1 — Q "
the Integrand will range from its maximum — *o t0 lta
minimum
- 1 - a i
2 o
This allows one to use a scale factor
■s r times as large as in the set-up (1) and to utilize
1 - q
the entire integrator disc. This, of course, means a consider-
able gain.
Eow the constant
i ± q
in the integral in (2)
appears only as an Initial displacement. It is therefore seen
that the realization of the proposed set-up (2) requires, as
compared with the customary set-up (l), an additional gear (to
produce 1 t q aLt ) and an adder. The following figure shows
the simplest mechanization.
>\
s
x
14-Q .
x - 2 x0t
t
t
It goes without saying that the gear ratio does not need to
be exactly
I. +. .3 4
2
xQ • any number near the middle of the range
of the integrand will do the same services •
If used to its fullest extent, the system as described
changes a previously positive variable into one taking on also
negative values. Although only one change of sign is introduced
this will introduce some new backlash* Now, if instead of (2)
we mechanize
(S)
x - qx.t
qxQ) dt,
T
-5~
the new integrand does not change sign, and no new backlash is
introduced. On the other hand, the optimum scale factor for
(3) is only — times that for (l), that is to say half the
1 - q.
scale factor for (2). We conclude that with proper corrections
for backlash the set-up (2) should prove besto However, if
enough frontlash units are not available at Aberdeen, the set-
up (3) may be tried with advantage.
2) A similar device can obviously be used wherever
the range of the integrand does not utilize the integrator
disc to its fullest extent* This is true for almost all
integrators whose outputs are:
(i) the horizontal displacement x,
(ii) s = fv dt , v being the speed,
(iii) Q"hj , where y is the height*
In the first two cases the new set-up would not produce any
additional loading since the integrators are driven by the
independent variable-motor. In other cases an additional
loading would ensue which may have to be compensated by the
uae of a larger scale factor on the t-shaft; this would in-
directly slow down the machine. Whether this will have to be
done is impossible to predict theoretically. Should it prove
necessary, it would be for the user to decide whether the gain
in accuracy is worth the loss in speed.
3) If the above described device should prove in-
- V/ --
* - v j
?'are &i#£iuZ£ fit cbs atpens* or
f ©Hewing uspr-c-vftmca? &*t
oonaidaraMa Eaaua] #J>rk end io&s Tn4 process of
integration may bis Stopped it ecn^aivfsat wnd tlx*
dure 4-5 cie:-- <jr 'be:: ?abr»vs! fe« <'* TX'f'
intervals? C-ofttfSSeifi. f'^r wxrole •. «c? 5.afcet*iaa4! febi fs*«
indicated ite the figure *' rath as ex» si
X
\
V
Her'?, even the usual pros a dure of Integration utilises the
entire range of the integrator disc and no gain can be achieved
by Means of the device as described above ► Ee^ever£, the integrand
any conveniently be treated by a double application of this
device splitting the interval of integration into two parts »
In othsi words, insteed of e given function fix) we integrate
the difi eranee betveen fix) and a step-function. The output
of she integrator is ~,o longer P'x) * j bufc th*
difference be ere en »' x) end e triangular (or "roof*-; funesisn.
fU)
r~ — V-
V
i — s„:
7-
Similarly, with a convenient subdivision we may use any step-
function for the integrand and the corresponding polygonal
line for the integral.
This procedure obviously requires resetting the
integrator in question and changing one gear ratio each time
the machine is stopped. On the other hand, the increase of the
scale factor is roughly proportional to the number of subintervals,
4) In principle this procedure may be looked upon
as a special case of the following more general method. Instead
of
(4) v(x) = Jj dx
write
(5) w(x) + 0U) = \(y + $*) dx,
where 0(x) is an arbitrary function and 0Hx) its derivative.
In practice, of course, 0(x) should be chosen so as to render
the maximum of Jy + 0'\ as small as possible in order to in-
crease the scale factor on the integrator. Now if 0(x) is
not a linear function, the mechanization of (5) would require
two new input tables or their equivalent. However, the possi-
bility of obtaining some special 0(x) by means of non-circular
gears should not be overlooked. This would mean a considerable
RESTRICTED
-8-
improvement of the linear method.
5) We have been asked by Dp. Dederick to consider
whether it would be advantageous to generate from an
input table (instead of by integration, as at present). The
foregoing remarks contain an answer to this question. It is
not difficult to s ee that the present method of obtaining the
function by integration is more efficients It would probably
become even more so if the recommendation 2) were put into
effect.
6) Although it is in no direct connection with the
subject of this report, we enclose an Appendix describing a
simplified method for computing gear ratios. This method is
based on previous experience (of one of us) at M.I.T. and may
prove useful in connection v/ith ballistic work on the Aberdeen
Analyser .
Brown University, Providence, R.I.
and
Bell Telephone Laboratories, N.Y.
May 27, 1943.
W. Feller
C.E. Shannon
iEOTIOT
-9-
A METHOD OF DETERMINING GEAR RATIOS
•
In this appendix a simplified method of determining
gear ratios for an analyzer set up will be described which
was used for some time on the K.I.T. analyzer and proved in
general to be considerably faster and easier to change than
the original method of equalities and inequalities. The
method may be briefly outlined as follows:
1. Draw the set up with an unknown gear ratio in
each shaft of limited displacement. An unspecified
ratio is also placed in the two inputs of each adder.
2. Calculate an approximate scale factor on the
independent variable to give the expected time of
solution at the average rate at which it turns.
Choose an exact scale factor near this approximate
one which is a "round figure" in terms of obtain-
able gear ratios - i,e., factorable into a small
number of simple rationale.
3. Choose in the same way scale factors for all
shafts of limited displacement - integrator inputs
and function table inputs, and outputs - so as not
to exceed their limits with expected displacements.
4. This fixes p by division, and from the integrating
factor of the integrators, the scale factors and
gear ratios of all shafts except those containing
adders. In the case of adders the input shaft with
smallest scale factor fixes the scale factor of the
adder, the other input being geared down to the same
scale factor. The output gear in the adder is then
fixed*
5. The set up is then inspected to see that no
integrators or other parts are too heavily loadedo
If they are, reduction gears are transferred from
inputs to outputs to reduce loads when possible,
otherwise the soale factor on the independent
variable is increased.
In case the ratios come out too complicated dif-
ferent scale factors are chosen in Step 3. With a little
practice and foresight, however, it is possible to obtain
suitable ratios on the first trial.
KTTOTEO
DO
Two Hew Circuits for Alternate Pulse Counting
The well known W-Z relay circuit is shown in
Fig. 1. A is a pulsing contact which is alternately opened
and closed. Indicating closure of contacts by 0 and open-
ness toy 1 and for relays 0 for operated (up) and 1 for
unoperated (down) the circuit goes through the following
periodic cycle of operation:
A
w
z
1
1
1
0
0
1
1
0
0
0
• 1
0
1
1
1
Thus one complete cycle requires two complete pulses on A.
This note describes two apparently new circuits
which perform the same function. These are shown in Fig. 2
and Fig. 3. The operating cycles for these are:
Fig. 2 Fig. 3
A
w
z
A
f
z
1
0
1
1
1
1
0
0
0
0
0
1
1
1
0
1
0
0
0
1
1
0
1
0
These three circuits may be compared with regard
to the number of elements required as follows:
Belays Contacts Resistances
Figure 12 1 continuity, 1 transfer 2
Figure 2 2 2 continuity, 1 break 1
Figure 3 2 2 transfer, 1 make 1
In Fig. 3 the resistance is theoretically superfluous;
if the transfer elements could be trusted never to be shorted
it could be omitted, but in practice would be necessary to
avoid shorts when the relays were being adjusted. Figs. 2 and
3 are essentially duals, and 3 was obtained from 2 by the
duality theorem.
In Fig. 2 it may be noted that the two relays are
*ip-when A is closed, while in the standard circuit they are both
^jTwhen A is open. This might be desirable in some applications.
Fig. 3 has the possible disadvantage that both ends of the
pulsing contact A are connected into the circuit, while in 1
and 2 one end can be grounded.
C. £. SHANNON
Att.
. 1, 2, 3
w
CONT. 6W'
o
— O G «
W
T
A/W
z
1
AAV
w
CONT QZ
W'
-o o
CONT
—O O—
I
z
AAV
w
w
-0 3
1
W-1
TRANS. Z TRANS. — ty\A/ — " FIG. 3
-o o
z
-o o — *
A
-o o
Z'
FIG. 1
FIG. 2
tTtlT
SCALE
mm within uriimilti. int.. ifTrnr
Counting Vp or ixmn vith -ulse counters w J 1
iith binary counter* of either relay or *l»c5rsnic
type i* is ;o£sit2« by simple KKsdif icutisn u> count bo ih up end
doon. £uppose Us* largest uuaber that oaa be j w^isterec is L*
refining the ao^lisent of «aiy »unh»r * & fey t-a * «' *e sots
that subtracting * nutther » rrsJi S is s^ulvileai ta adSin* w its
eoapllsjsnt ftt«i • Mf*He • thus If in 6 binary oouatsr
** t&tis the soapllosat o/ « reading ^hioa s»&as locking up Uis
;*ul*y urieft ttrt dSKja and #4ee-vei lu the oa^, aid
putting out the tubas vfcioU fire ot&guetiag unfi vie iu Ute
electronic auoe) and then let the counts* eo&tlnue add tits dumber
of pulses in rjuertion, and finally t^ice the aa^lifitaat, &^uin, we
a&ve au&trseted the nuabsr. ^etually hm**v»r, this -raoees onn
be done si&ply by trcuef orric^ the carryover le&as t» the opposite
digit ( tube or rtl«y). ic the reity esse this sjoouats t*» a transfer
Qcm toot *e«*c*n each adjnsent pair of digit*, a&e an additional
safes oostoot* in the eleutrouio oaft* the carryover lease go froa
the " tAtar tube plut* to triiis on the next sts^a. Here *e eoul4
insert «n alcetroale transfer oontaat, *» s^wt, for exsnplo in
Figure 1. jthen *c wish to add, the ©©asson eon troi leads far "edd
is given sutoff voltage, the -subtract" lead a large negative vol-
tage. A positive lapulee on the "one0 plate of a state then cause*
one side of the double triade to c endue t giving % negative impulse
to the next g7id» far a enTryvwr • f er subtrfcctioo the voltages
on the soatrol leads ars revexfcod atid carryover ooours when the
"aero" plate volte, • inore&ses i.e., when this tube goes out*
0« £. &£*KjfCX
C-»f A (9-4*)
Cover Sheet for Technical Memoranda
Research Department
subject: clrcuitg for a PiC>M> Transmitter and Receiver -
Case 20878
ROUTING:
" S.A.S.,H.W.B., H.F.
2 -- CASE FILES
* G.W.Gilman
5 -H.W.Bode
s A. G. Jensen
-> W.M.Goodall
8 E.Peterson
9 H.SoBlack
10 -W.F.Simpson - Patent Dept.
11- J. H.Pierce
12- R.L.Dietzold
13- £.B Zeldman t$55$£^L
14- W.T.Wintringham
15- F.B.Llewellyn
16- C.H.Elmendorf
17- B. M.Oliver
1 8- C.E. Shannon
MM
44-110-37
DATE June 1, 1944
author s c.E.Shannon and
B.M.Oliver
ABSTRACT
Circuits are described for a P. CM. transmitter
and receiver. The transmitter operates on the principle
of counting in the binary system the number of quanta
of charge required to nullify the sampled, voltage.
i
MISSION OR TKt RTVELATION or I , C^rt^
Ciroults for a P. CM. Transmitter and Receiver - Case 20878
MM-44-110-3
June 1, 1944
MEMORANDUM FOR FILE
The circuits shown in the present memorandum are
intended to fill the boxes of the block functional designs
for a PCM transmitter and receiver shown in Fig. 6 of a December
1943 lueworandum (MM-43 -110-43) . The transmitter functional
diagram is shown here as Fig. 1 and the general operation
is as follows. The incoming signal is sampled periodically
by closing the electronic switch 1 with periodic impulses
from the timer. This charges condenser C to the sampled
voltage and the electronic switch opens after each impulse
isolating the condenser from the signal. The existence of
a voltage across the condenser causes the comparator to olose
electronic switch 2 which allows pulses of charge to feed
into the condenser from the pulse generator, discharging the
condenser. The number of these pulses is counted in the
binary system by the binary counter and when the condenser
is reduced to a reference voltage, the comparator opens elec-
tronic switch 2. Near the end of the sampling period the
binary counter is connected to the distributer which registers
the binary number counted, and the counter is then reset to
zero; both of these operations controlled by impulses from the
timer. The distributer then sends a series of pulses or not
down the output line according as the binary digits are
1 or 0. These digits are sent in reverse order, the least
important being sent first, to tie in with the contemplated
receiver circuit.
The specific circuits are shown in Figs. 2 to 8, and
detailed descriptions of their operation follow.
Fig. 2 shows the electronic switch 1 which charges the
condenser C to the signal voltage at the sampling times. The
signal wave is biased up so that its minimum value is slightly
positive, and impressed on terminal 1 as a voltage; i.e, the
signal source as seen from terminal 1 is assumed to be of low
impedance. The timer, at the sampling time puts a positive
pulse on terminal 2, which is inverted by the triode to give
a negative pulse on the pentode control grid. This causes the
pentode which was previously conducting to cut off. Before
the pulse condenser C had a small minimum positive charge
and neither diode was conducting since the plates were held
at a low positive potential by the pentode current. As the
THIS DOCUMENT CONTAINS INFORMATION AFFECTING THE
NATIONAL DEFENSE OF THE UNITED STATES WITHIN TH~ MEAN-
ING OF THE ESPIONAGE ACT. SO U. S. C. Jl AND 12. ITS TRANS-
MISSION OR THE REVELATION OF ITS CONTENTS IN ANY MANNER .
TO AN UNAUTHORIZED PERSON IS PROHIBITED BY LAW.
pentode cuts off, the diode plates swing positive and the right
hand diode starts to conduct charging the condenser. As this
condenser voltage builds up exponentially the voltage on the
diode plates also increases positively until it reaohes the
signal voltage and at that instant the left hand diode starts
to oonduct. The voltage stops rising at this point since the
plates are now essentially short circuited to the low impedance
signal source. This all occurs during the timing pulse, and
at the end of this pulse the pentode again starts oonduoting
dropping the diode plates to a small positive voltage, less
than the minimum signal voltage, and isolating the condenser*
Fig. 3 shows a standard multi-vibrator circuit for
giving a series of square pulses. The coil condenser cross
connection of plates to grids causes the grid transient to
be a cosine curve which crosses the cut off grid voltage at
a time determined essentially by the LC product and independent
of amplitude changes due to variations in plate supply, etc.
As this point determines the period of oscillation, the
oscillator has good frequency stability. The output appears
on terminal 6 as a square wave.
Fig. 4 is the comparator, which is actually only a
differential amplifier with sufficient gain so that the
granularity voltage applied to the input is capable of
driving the amplifier from saturation in one direction to
saturation in the other. The input is the voltage on condenser
C which immediately after a sampling instant, will be at the
sampled signal voltage. This voltage starts decreasing by
steps as the condenser is discharged and when the condenser
voltage applied to terminal 3 moves down the step which crosses
the differential amplifier threshold, the amplifier swings from
saturation with output terminal 5 at nearly zero voltage to
a high negative voltage.
The electronic switch 2 is shown in Fig. 5. This
circuit sends units of charge into the condenser through
terminal 3 under the control of the comparator output coming
in on terminal 5. The multi-vibrator output is connected to
terminal 6 and the output of the multi-grid tube will be a
square wave when 5 is positive, which ceases when the
comparator swings to the other saturation point driving the
voltage on 5 in the negative direction. The double diode
connection gives a pump action. When the plate voltage of
the multi-grid tube increases to the upper part of the square
wave, the charge flows into the condenser from terminal 4
through the left diode. During the lower part of this wave
- 3 -
the oondenser discharges through the right diode out into the
condenser C, via terminal 3. As this causes the potential of
3 to decrease gradually down a step function, it is necessary
for the input voltage at 4 to decrease similarly; otherwise
the difference in voltage between 3 and 4 would cause the size
of quanta to decrease gradually. This lowering of the voltage
on 4 is accomplished by a cathode follower arrangement on the
first cathodes in the comparator, which follow the step voltage
down.
The binary counter is shown in Fig. 6. The descending
step voltage which appears on condenser C is applied to the
input of this circuit through terminal 3. The input resistance
condenser combination serves as a differentiating circuit (the
time constant fairly small compared to the time between steps)
so that the voltage applied to the first grid of the double
triode consists of a series of negative spikes. The double
triode is simply a two stage resistance coupled amplifier, and
its output feeds the binary counter digit tubes. This circuit
is of standard type with two pentodes in each stage and there
are two stable points for each stage, one with the upper tube
cut off and the lower tube conducting, and the other, the con-
verse situation. A negative impulse from a preceding stage
applied through the coupling condensers changes the state from
the previous stable condition to the opposite one. This impulse
is applied symmetrically to both suppressors, but the condenser
across the cathode resistances, charged in one direction from
the previous state, biases the choice of the next state toward
the opposite one. The control grids of the "zero" tubes (the
upper row which are conducting when the corresponding binary
digits are zero) are connected to a common control lead which
is used to reset the reading to zero after the reading is reg-
istered by the distributor. This is accomplished by a neg-
ative impulse from the timer. The outputs to the distributer
are taken off the plates of the "unit" tubes.
The distributer is shown in Pig. 7. After the
number of quanta of charge has been counted in the binary
counter, the leads 11, 12, 13, 14, 15 will have either low
positive voltages or B+, according as the corresponding digit
is one or zero. The grids of the left triode, will then be
either negative or positive from the potentiometer action
to the negative voltage C-. To register the counter reading,
a positive pulse from the timer is applied to the control
grid of the common pentode allowing it to conduct and pulling
the cathode of the left triode and the diode in all stages
negatively. If a digit is zero, the potential of the cathodes
in that stage stops at a positive value due to current through
the triode and the diode does not conduct. If the digit is
one the cathodes are pulled negative and the corresponding
oondenser C0 ia discharged through the diode and pentode.
At the end of the registering pulse, the cathodes go positive
again, isolating each C0, with the digit registered as
presence or absence of charge. The reading is taken off the
(/— series of condensers CQ in sequence by positive pulses from
the timer on leads 21, 22, 23, 24, 25. These pulses allow
the right hand triodes to conduct and each Cq in turn to
oharge through the output lead, leaving them in the normal
state (at a voltage about equal to the pulse voltage). If
the digit is "zero" no oharge of CQ from the output lead
occurs. Thus negative pulses appear on the output when and
only when the registered digits are one.
The timer system is shown in Fig. 8. An oscillator
which may be synchronized subharmonically with the pulse
generating multi-vibrator, operates at the sampling frequency.
This passes through the clipper amplifier to give a square
wave, which is differentiated to give alternating positive
and negative spikes. A second clipper amplifier eliminates
the negative spikes and makes the positive ones rectangular.
These short rectangular pulses are fed into a delay line
terminated in its characteristic impedance. The timing pulses
needed for the various circuit functions are tapped off at
the appropriate places as indicated. A synchronizing pulse
may also be taken off the same delay line.
Fig. 9 shows the receiver circuit. The signal
passes through the clipping amplifier which is adjusted to give
a saturation voltage on the output if a pulse is present and
none if absent. This output is applied to the grid of a
multigrid pentode, whose other control grid is given positive
gating pulses at the center of the digit intervals. These
gating pulses allow the pentode to conduct if a pulse is present
and the plate current is then independent of the plate voltage
(providing this stays within certain limits) so that if a
pulse is present, a fixed amount of charge (equal to the
length of the gate times the pentode current) flows onto the
condenser. The time constant of the R C system (including the
pentode load resistance) is adjusted to allow the voltage to
restore itself halfway toward the equilibrium value in the
time from one digit to the next, so that after all pulses
have been oollected on the condenser, the charge contributions
of the first, second, third etc. have decayed by factors of
2^' i2"' 1# At this tlme a positive gating pulse is put
(r on the grid of the second pentode, allowing the condenser to
discharge rapidly into the low pass filter. The timer system
can be realized with the systems shown in either Fig. 10 or
Fig. 11.
C. 2. SHANNON
B. M. OLIYZR
Att.
Figs. 1 to 11
s
.-. \ Si
0
F/G -J
! •
D-0
IuIjw sn*pe to fclnlaine Bend sidtn fcitn Munprerlar^iD* 7-uloea
*e ooaslder tbe problem of » taping pule** #{t) enlen
ere aero outside -fc, U in ouen * wey an to nlalml*» tbe UtmA
nldtn of tbe power opeetrua of t&e ennenble of funotioas fors»4
by aeadiiis s eeq*eaee of tne fuaetlean *{t) end 0, witb epeeia*
or £it tne probabilitiee of eltber b*i»£ 1/2.
suoh eneesiblee of fun art iocs.
Theorem: i*t an ensemble of function* bo defined by
n« -~
enere tbe o^ ere enoeen iadopaaciintly end ore equally likely to
bo one or s«ro. toe power epwetro* of f{t) ti*tn eomnleto of
two parte, e point epeetrom eonsl*tia& of too epeetrw* of
%X * (t*ftam), i.e. tne spectrum of o(t) repented, end o eontin-
uvmm pert eoneintln* of tne ottor^y opoetrm of ♦(*) «
f irst « theorem will bo prored on tne epestrtsa of
Consider too estooorreletlom of f(t)
4{ki - U» |f J *f <*> f(t»k) dt
Y^OO _-r
• U» A /*£ e{t***n) £ n* o(t**»m»>>} dt
I** integrand oen bo written
^a % a* a(t*a*a) »{t**««00
* j} •* a(t*t*a**J
4 •£ fit-in) oftt* a«»*vJ
>Uaa «• eraraga , Hit aua of tfca first two parta givaa Urn suto-
correlation of ti* f aaatiaa J £ a* aiaaa tka ooaffiaiaata
a* aa (a^a) feara saa oaanea ia four of aalag toots a$aal to eaa,
aaa ia tat aaaoaa t«r* *jS aaa taa aaa* ataa vaiaa.
Ttoo iaat tana la taa liait reausao to
fit) f|I V) at
• a
by *? aoapaaaatoa for taa attoaar of taras.
Taaao two parts (in taa saaarata aaa aaatiaaoaa porta
of taa apaetnaa, taa first tolas taa aataoorrslatioa af a(t)
raaaataa aaa taa aaaoaa tivlog taa saargy apoatram af a(t)
la oaao »(t) • 0 oatalao -u, £, taa aaaarata part aaa
poaor at o - ft, 1, t , S, .... aaoeatia* to
f (t) - ^ ♦ r am aaa at ♦ I. »a aia at.
Sap^oM w *i*0 to Ofaopo o{t) ljrla« »iti»io -L, I is
•at* • »oj os to alolalso to* bood oprood of too upectrua &*
ooooorod ojr
« - Jo* *(o) do.
Tbo oantriOutiooo of too two parts of too spectra eon oo odd**,
and toot fro* tfc* dooorot* port Is
Tor too continuous port udo& too toooroo t&et too j»£ F*(« ) da -
jt^ltJJ* dt wb*re ffo) ood fim) aro fourUr traoof rao «o Hovo
*t • f*U)f - £ ten1 • h** *a ♦ **a* * «*♦...!
l.o* , tto mm oo too desoroto sootrlootioo. To* tatal a i» therefor*
To mioiodse * «ltO o flood total eoersjr per poise
oed with ooosdoxy ooodltiooo •(£) - - 0 wo vast ooTiooolj
plooo oil too eoergjr la too first tere, o oooloo oorto displaced
to oo tensest to too tUM) oxio.
«■»*
A «
fit)
Cover sheet for technical memoranda
Research Department
subject: A Mathematical Theory of Cryptography - Case E0878 ( ^0
\
ROUTING:
i _ HTfffl-HF-Case Files
2 -
CASE files
3 —
T
V »
4 -
T
5
H.
3. Black
6 -
F.
B. Llewellyn
7 -
H.
Nyquist
8 -
B.
tf» Oliver
9 -
R.
E, Potter
io -
C.
B. H. Feldrian
11 -
R.
C. Kathes
12 -
R.
V. L. Hartley
13 -
J.
R. Pierce
14 -
H.
W. Bode
15 -
R.
L. Dietzold
o 16 -
L.
A. MacCall
17 -
W.
A. Shewhart
J.8 -
S.
A. Schelkunoff
19 -
c.
E. Shannon
20 -
Dept. 1000 Files
mm— 45-110-92
date September 1, 1945
author C. E. Shannon
INDEX no. P 0#4
Dos mi saui
ABSTRACT
A mathematical theory of secrecy systems is
developed. Three main problems are considered. (1) A
logical formulation of the problem and a study of the
mathematical structure of secrecy systems. (2) The
problem of "theoretical secrecy," i.e., can a system be
solvod givon unlimited time and how much material must
be intercepted to obtain a uniquo solution to cryptograms.
A sccrocy measure called tho "equivocation" is defined
and its properties developed, (3) The problem of
"practical socrocy." How can systems bo made difficult
to solve, ovon though a solution is theoretically
POS8lbl0t ' • ' THIS OOCUKEHT CO^S^-or^ 5g
STATES ^^fK ^
LAWS, TIU.E I? RCVEX****1 OF «J*
CONTENTS »N AN. »N,lth TV
PERSON IS PROHIWTEO BY IA«.
A Mathematical Theory of Cryptography - Case 20878 ((4)
MM-45-110-92
September 1, 1945
Index P0.4
Introduction and Summary • BOD WR 5200.10
In the present paper a mathematical theory of . . •
cryptography and secrecy systems Is developed*. The entire
approach is on a theoretical level and is intended to spmple* :
ment the treatment found In standard works on cryptography, * . • , - V •
There, a detailed study Is made of the many standard types of-^:- •
codes and ciphers, and of the ways of breaking tjiea*. We will
be more concerned with the general mathematical structure, and
properties of secrecy systems, •: . .-'
The presentation is mathematical in character. Wo
first dofino the pertinent terms abstractly and then develop
our results as lcnrias and theorems. Proofs which do not con-
tribute to an understanding of the theorems have been placed
in the appendix.
The mathematics required is drawn chiefly from
probability theory and from abstract algebra. The reader is
assumed to have some familiarity with these two fields. A
knowledge of the elements of cryptography will also be help-
ful although not required.
The treatment is limited in certain ways. First,
thero are two general typos of secrecy system; (x) conceal- *
ment systems, including such methods as invisible ink, con-
cealing a message in an .innocent text, or in a fake covering
cryptogram, or other methods in which the existence; of the . -
message is concealed from the enemy; (2), "true" seorocy systems .
where the moaning of the message is concealed by ciphofr, code,
etc., although "its existence is not hidden. We oonsider_ only V
the second type--oonoealment systems are more of a psychological
than a mathematical problem. Secondly, tho treatment Is limited v
to the case of discrete information,, whore tho information to
bo enciphered consists of a sequence of discrete symbols, each -
chosen from a finite set. These symbols may be letters in a
*Soo, for example, H.F.Gaines, "Elementary Cry^tana^1J(s^oRMAT.oN w«g
or M. Glvierge, "Cours do Cryptographic. ft;5 TME katonm- oi^ w ^Vvonage
* " person is p*«oH»an«> a*
- 2 -
language, words of a language, amplitude levels of a "quantized"
speech or video signal, etc., but the main emphasis and think-
ing has beon concerned with the case of letters. A preliminary-
survey indicates that the methods and analysis can be general-
ized to study continuous cases, and to take into account the
special characteristics of speech secrecy systems.
The paper is divided into three parts. The main re-
sults of these sections will now be briefly summarized. Tho
first part deals with tho basic mathematical structure of
language and of secrooy systems, A language is considered for
cryptographic purposes to bo a stochastic process which pro-
duces a discrote sexjuonco of symbols in accordance with some
systems of probabilities. Associated with a language there
is a certain parameter D which wo call tho redundancy of the
language, D measures, in a sense, how much a text in tho
language can be reduced In longth without losing any informa-
tion. . As a simple example, if each word in a ■t'efcfc' ip repeated
a reduction of 50 'per cent is immediately poesi*lcV .further 4 : :
reductions may be possible due to tho statistical structure of *
tho language, the high frequencies of cortaih lottersorv words, r
etc. The redundancy is of considerable importcjido ' ;in; the ' study '
of secrecy systems. , ' /; '
A secrecy system is defined abstractly as a sot of
transformations of one space (the sot of possible messages)
into a socond space (the sot of possible cryptograms). Each
transformation of the set corresponds to enciphering with a
particular key and the transf omations are supposed reversible
(non-singular) so that unique deciphering is possible when the
key is known.
Each key and therefore each transformation is assumed
to have an a priori probability associated with it— the proba-
bility of cEoosing that key, Tho set of messages or message
space is also assumed to have a priori probabilities for tho
various messages, . i.e., to be a probability c^ measiire space.
f ■
In the usual cases the "messages" oonsist of sequences
of "letters.". In this oase as noted above the ©essage space is
represented by a stochastio process which generates sequences of
letters according to some probability structural ■. ~: - :<p
.' • , • v ' ' '*•:..- •'. - '•• . " • . , ! .' -v • ,;
">." These probabilities for various keys and messages^ are^
actually the enemy, crypt analyst's a priori probabilities for /
the choices in question, and represent his. aj>rl6rf knowledge"
of the situation* Touse tho system a key is first selected
and sent to tho receiving point. The choice of 6,&©y determines
a particular transformation in tho set forming the^sys torn. Then
a message Is selected and tho particular transformation applied
to this message to produce a oryptogram. This cryptogram is
- 3 - •HlffflSHflAL
transmitted to the receiving point by a channel that may be
intercepted by the enemy. At the receiving end the inverse
of the particular transformation is applied to tho cryptogram
to recovor tho original message.
If the enemy intercepts tho cryptogram he can calcu-
late from it the a posteriori probabilities of the various
possible messages and keys which might have produced this
* cryptogram. This set of a posteriori probabilities constitute
his knowledge of the key and moss ago after the interception.*
The calculation of these a posteriori probabilities is the
generalized problem of cryptanalysis • ' ~ ."" " ; \
i *
As an example of these notions, in a, simple substi-
tution cipher with random key there arc 261 transformations,
corresponding to the 261 ways we can substitute for 26 dif-
ferent letters.' These are all equally, likely and each there-
fore has an a priori probability l/B&Wz it this is applied
to "normal English" the cryptanalyst being assumed to have no
knowledge of tho message source o^hoc than,, that- it is English,
tho a priori probabilities of various m&jBsageak Gf N lectors'
.ore merely their frequency in normal JSngiish iext* ~
If the enemy intercepts N letters of cryptogram in
this system his probabilities chango. If N is large enough
(say 50 letters) there is usually a single message of a poster
probability nearly unity, while all others have a total proba-
bility nearly zero. Thus there is an essentially unique "solv
tion" to the cryptogram. For K smaller (say N « 15) there wil
usually be many messages and keys of comparable probability,
with no single one nearly unity. In this case there are multi
"solutions" to the cryptogram. , , -
Considering a secrecy system to be a set of trans-
formations of one space into another with definite probability
associated with each transformation, there are two natural coe
binlng operations v/hi oh produce a third system from two givon
systems. The first combining operation. Is called the product
operation and corresponds to enciphering the message with the
first system R and enciphering tho resulting cryptogram with
system S, the keys for R and 3 being .chosen. ; independently.
This total operation is > secrecy sjrstcte "whose transformations
consist of all the products (in tho Jusual , sons© of products of
transformations) of transformations ia $ with transformations
in R. The probabilities arc 'the prodticts of the" probabilities
for tho two transformations. . . 3. J§E .:\ T-
The sooond combining operation is "weighted addition
»> J T- -
T - pR 4 qS . J . p * q «- 1-
*"Khowlodgo" is thus identified with 'a set of propositions hav
associated probabilities. We are liero' at variance with the
doctrine often .is sumo d in philosophical studies which conside
knowledge to be a set of propositions which are either true o
fslso. . f ■ :. v.
4
t
It corresponds to making a preliminary choice as to whether
system R or S is to be -used with probabilities p and q, respec-
tively. When this is done R or S is used as originally defined.
It is shown that secrecy systems with these twn com-
bining operations form essentially a "linear associative algebra
with a unit element, an algebraic variety that has been exten-
sively studied by mathematicians. Some of the properties of
this algebra are developed.
Among the many possible secrecy systems there is one
type with many special properties. This type we oall a "pure"
system. A system is pure if for any three transformations T, .
T.t Tk in the set the product 1
TiVV .
is also a transformation in the set, and all keys are equally
likely. That is enciphering, deciphering, and enciphering with
any throe keys must be equivalent to enciphering with some key.
With a pure cipher it is shown that all keys are
essentially equivalent—they all lead to the same set of a
posteriori probabilities. Furthermore, when a given cryptogram
is intercepted there is a set of messages that might have pro-
duced this cryptogram (a "residue class"/ and the a posteriori
probabilities of messages in this class ore proportional to the
a priori probabilities. All the information the enemy has ob-
trinod by intercepting the cryptogram is a specification of the
residue class. Many of the common ciphers are pure systoms,
including simple substitution with random key. In this case
the residue class consists of all messages with the same pattern
of letter repetitions as the intercepted cryptogram,
Two systems R and S are defined to be "similar" if
there exists a fixed transformation A with an inverse, A"1 such
that
' . R « AS . , ~
■ * 'J
If R and S are similar, a one-to-one correspondence between the
resulting cryptograms can be set "up leading to the same a poste-
riori probabilities. The two systoms are cryptnnalyticaTly the
samo , v . » .
The second main part of tho paper deals with tho prob-
lem of "thooretical security." How secure is a system again:
cryptanalysis when the enemy has unlimited time and manpower
available for tho analysis or intercepted cryptograms?
"Perfect Secrecy* is defined by requiring of a system
that after a cryptogram is intercepted by the enemy the a pos-
teriori probabilities of this cryptogram representing various
messages be identically the same as the a priori probabilities
of the same messages before the interception. It is shown that
perfect secrecy is possible but requires, if the number of
messages is finite, the same number of possible keys--if the
messago is thought of as being constantly generated at a given
"rate" R, (to be defined later), key must be generated' at the
same or a greater rate*
If a secrecy system "with a finite key is used, and N
letters of cryptogram intercepted, there will be, for the enemy,
a certain set of messages with certain- probabilities, that this
cryptogram could represent. As N Increases the field usually .
narrows down until eventually there is a unique "solution'*: to
the cryptogram — one message with probability essentially unity :
while all othors are practically zero. A quantity OJN) is de- >' .: \
fined, called the equivocation, which measure^ lii n statistical v
way how near the' average cryptogram of H letters is to a unique
solution; that is, how uncertain the enemy, is of the original; - -
message after intercepting a cryptogram of N letters. Various
properties of the equivocation. are deduced — for example, the
equivocation of the key never incroasos with increasing N.
This quantity Q ia s theoretical secrecy index — theoretical In
that it allows the enemy unlimited time to analyse the cryptogram
The function Q(N) for a certain idealized type of
cipher called the random cipher is determined. With certain
corrections this function can be applied to many cases of practi-
cal interest. This gives a way of calculating approximately
how much intercepted material is required to obtain a solution
to a secrecy system. It appears from this analysis that with
ordinary languages and the usual types of ciphers (not codes)
this "unicity distance" is approximately |K|/D. Here |K| is a
number measuring the "size" of the key space. : If. all keys are
a priori oqually likely |K| is the logarithm of the number of
possible keys. D is the redundancy of the language and measures
the excess information content of tho language. In simple sub-
stitution with random key on English |K| isltW) 261 or about , / .
£0 and D is about .7 for English. ■ Thus unicity occurs at about ..
30 letters. _ *' ' . _ >. ;J;V^a'V''VY. '
It is possible to" oonstruct secrecy . systems with a
finite key for certain ""languages" in which the function ft(N)
does not approach zero as N «©» - In this case, no natter how .
much material is intercepted, the enemy still does not got a., —
unique solution to the cipher but is left with many alterna-
tives, all of reasonable probability. Such systems we call
ideal systems. It is possible in any language to approximate
such behavior — i.e.., to make the approach to zero of Q(N) recede
- 6 -
ifcyiii'lUJJJ'llAL
out to arbitrarily large N. However, such systems have a
number of drawbacks, such as complexity and sensitivity to
errors in transmission of the cryptogram.
The third part of the paper is concerned with "prac-
tical secrecy." Two systems with the same key size may both
be uniquely solvable when N letters have been intercepted, but
differ greatly in the amount of labor required to effect this
solution. An analysis of the basic weaknesses of secrecy sys-
tems is made. This leads to methods for constructing systems
which will require a large amount of work to solve* A certain
incompat ability among the various desirable qualities of
secrecy systems is discussed,
\ -
PART I
FOUNDATIONS AND ALGEBRAIC STRUCTURE OF SECRECY SYSTEMS
1. Choice, Infornatlon and Uncertainty
Suppose we have a set of possible events whose proba-
bilities of occurrence are p,, pg, ... , p_. Those probabilities
are known, but that is all we know concerning which event will
occur. Can we define a quantity which will measure in some
sense how ^uncertain" we are of tho outcome? How much "choice"
is involved in the selection of the event by the chance element .
that operates with those probabilities? We propose as a numer-
ical measure of this rather vague notion the quantity
. ,n " : . ' :' .
H « - Z pA log pA* »
There are many reasons for this particular formula. Quantities
of this kind appear continually in the present paper and in the
study of the- transmission of information.
To justify this definition wo will state a number of
properties that follow from it. Those properties will not be
provod here,* but are easily deduced from the definition.
Properties of H * - 2 p^ log p^.
1. H = 0 if and only if all the p.^ but one are zero, this
one having the value unity. Thus only when we are certain
of the outcome does H vanish.
2. For a given n, H is a maximum and equal to log n if and
only if all the p, are equal (i.6. l/n) . This is also
intuitively the most uncertain situation.
3. Suppose there are two events in question, with m possi-
bilities for tho first and n for tho second. Lot p^^ be
the probability of tho joint occurrence of i for tho first
and j for the second. The uncertainty of the joint event ?•.
is - .
H " " I J Pi^ l0g PiJ • •
For given probabilities p^^ ■ Z p. . for the first and
* It is intended to develop these results in coherent fashion
in a forthcoming memorandum on the transmission of informa-
tion. '
qj » S for the second, tho quantity H is maximized if
ond only if the events are independent, i.e., p^. = Pi^j *
This maximum value is the sum of the individual uncertainties
H — Hx * Hg
» -^S pj log Pj^ - 2 log q j ♦
These facts can bo generalized to any number of .different
events, > ^ % .
Suppose there are two chance events A and B as in 3. not
necessarily independent. We define the mean conditional
uncertainty of B, knowing A as - •••
BTA(B) - 2 p{A) HA(B>
where HA(B) is the uncertainly of B when A has a definite A
value A. Thus ^(B) is the average uncertainty of B for
all different events A, weighted according to their differ-
ent probabilities of occurrence c The uncertainty of tho
joint event is the sum of the uncertainty of the first and
the mean conditional uncertainty of the second. In symbols
H(A,B) - H(A) + HA(B)
This is true whether or not thero are any casual connections
or correlations between the two evonts.
In the same situation the uncertainty of B is not greater
than the joint uncertainty H{A,B),
H(B) < H(A,B)
The equality holds if and, only if every B (of prdbability /~;
greater than zero) is consistont with -only one A. That -
is, if A is uniquely determined by B. •
From properties 3 and 4 wo have . .. r- .*
H(A) + H(B) > H(A,B).
H(B) > H(A,B) - H(A)
= H(A) + HA(B) - H(A)
H(B) > H,(B)
7.
Thus tho uncertainty of B is not greater than its avoragc
value when we know A. Additional information never in-
creases average uncertainty. The equality holds if and
only if A and B are independent.
Suppose we have a set of probabilities plf pg, pn#
Any change toward equalization of these (supposing 'them
unequal) increases H. Thus if p^ < pg and^wo Increase p^,
decreasing pg an equal amount (to keep the sum 2 p^ con*
stant at unity) so that p^ and pg aro more nearly equal,
then H increases . More generally if v/e perform any rtaver-
aging " operation on the pj,, of tho form '
■pi
8.
a permutation of tho p. with H of course
samc^. 3
where 2 a^j * 1 and all a^ > 0, then H increases (except
in tho special case where this transformation, amounts to
no more than
remaining the
... •
H measures In a certain sense how much "information is '
generated" when the choice is made. Suppose such a chance
event occurs and we wish to describe which of the n possi-
ble events took place • The average amount of paper re-
quired to write.it down in a properly chosen notation is
in the cases of interest to us, about proportional to H.
Thus there might be 10^0 «■ 1Q50 possible events, with
10
■ 10"" 3^ and
of them having a pr
probability of ^ .1CT50. We could set up a notational sys-
tem to describe which event occurs as follows* We number
the events from 1 up to 10*^ + 1050 and when one occurs -
write down the corresponding number. The average amount
of paper required will be proportional to the overage
number of aigits we need. This will bo nearly 30 If the'li. /iy
event Is in the first group of lO30, and about 50 If In the' "/*;/
second group. Thus the average number of digits, is about
40. We also have ,"• - V
K* -10'
* 40
30 | ip-ftf-iog ficT50
- 10
9-. Although tho last result is only approximately true vtf
the number of choices is finite it becomes exactly tri.
when an unlimited sequence of choices is made. Thus 3
a sequence of N independent choices is made each choic
being from n possibilities with probabilities
p^, Pgi ••*» Pn then the total amount of information
genoratod is
H ■ - N Z Pjl log pj
; If N is sufficiently large, the expected number of dif
required to register tho particular choice made is arl
trarily close to H, providing the. correspondence betwc
- sequences of digits and sots of choices is correctly r
. If incorrectly made it will be greater than H-. Moreo\
./V if n is sufficiently largo tho probability of needing
more than H digits is very small* - \ / . ,
10* It can be shown that if wo requlro^oejrtiairi reasonable
"properties of a measure o^choioot^H^ncertainty then
formula - S.p^ log pA necessarily follows* These roqv
properties and the proof of this statement are given i
Appendix It The chief property is that tho measure be
a sense additive— if a choice be decomposed into a sei
of choices the total choice is the sun (properly weigl
of the individual 'choice*. . ^
II, Finally we note that quantities of the type 2 log j
have appeared previously as measures of randomness, pr
larly in statistical mechanics. Indeed the H in Boltr
H theorem is defined in this way, being the probabi
of a system being in cell i of its phase space. Most
the entropy formulas contain terms of this type.
■ ■■■■■■■■ - ♦,"-''-\
Tho base which is used in taking logarithms in the for
amounts to a choice of the unit of measure. v If the base is
we will call the resulting units "digits;" if the base is t
the .units will be oallod Halternativps.^ i- One digit is nbou
alternatives. A' choice from 1000 equally likely possibilit
is 3 digits or about 10 alternatives. . ,
2. Language as a Stochastic fepcess> 6 v •
A natural language, such as English, can be studi
from many points of view — lexicography, syntax* semantics,
history, aesthetics, etc. The only properties of a languag
of interest in cryptography are statistical properties. Wh
are the frequencies of the various letters, of different di
(pairs of -letters), trigrams, words, phrases, etc.? What i
the probability that a given word occurs in a certain mossag
The "cleaning" of a message has significance only in its in-
fluence on those probabilities. For our purposes all other
properties of language can be omitted. We consider a langur.
therefore, to be a stochastic {i.e. a 'statistical) process w
generates a sequence of symbols according to some system of
probabilities. The symbols will be the letters of the langu
together with punctuation, spaces, etc., if these occur.
Conversely any stochastic process which produces a
discrete sequence of 'symbols will be said to be a language.
This will include such cases as: , , ,
1. • Natural written languages such as English, German, Chine
S% Continuous information sources that have been rendered
discrete by some quantizing process,:. Tor example., the
quantized speech from a PCM transmitter, or a quantized
•television signal* * ..
3. "Artificial" languages," where we merely defiae abstract 1
a stochastic process which generates a sequence of symbc
The following are examples of artificial languages.
(A) Suppose wo have 5 letters A, B, C, D, E which are
chosen each with probability .2, successive choicer
being independent. This would lead to a sequence c
which tho following is a typical example.
B DCBCECCCADCBDDAAECEEA
ABBDAEECACEE'BAEECBCEAD
This was constructed with the use of a table of rar
numbers,* •.:'<•
(B) Using the same 5 letters lot the probabilities be
.4, .1, .2, .2, .1 respectively,. with successive
choices independent.- A typical "text" in this
language is thoni . ' ;1^fC> ' ' ^ '.;
""' ' a A A C D C B D C E A A D A D A C E D A '
v . f ; J; 'v i A P CA BE D A D D CE;0 A AAA A D
■(C) A more complicated structure is obtained "if succesi
letters are not chosen" independently but their prot
bilities depend on preceding lottors. In the simpj
* Kendall and Smith, "Tables of Random Sampling Numbers,"
Cambridge, 1939.
- 18 -
case of this type a choice depends only on the
preceding letter and not on ones before that. The
statistical structure can then be described by a
set of transition probabilities p^j), the probabi"
that letter i is followed by letter The indices
i and j range over all the letters in the language
A second equivalent vrny of specifying the structur
is to give the digran probabilities p(i,j), the re!
tive frequency of the digram 1 j in the language.
The letter frequencies pTi), (the probability of
letter i), tho transition probabilities p^j) and 1
digram probabilities p(i,j) are related by the foi:
ing formulas,, , ~ "■• . ~.
pfi) -3 p(j,,J) -2 p(j,i) ~ Z p(jWlj'-
' . :. t.J ,,, x y . j ■ 3 :
;: - P(i) %M J^^^xl 2|J
i p1(ji -|p(i) - p(i j) * i %
As a specific example suppose there are three lettt
A, B, C with the probability tables:
PiU)
A
3
B C
A
0
,e .2
i B
.5
•5 0
c ;
,5
.4 a
A
B
P(i)
9
2?
16
£7
a
27
A
3
B
A
0
4
IF
i B
8
27
e
27'
1
ST
4
135"
A typical text ^in, this language is the following.
A B B ABA B A B. A B A B A B'B B ABB B B B A B
k ;B A B A BAB B B A C A C A B B A 3 B B 3 A B B
A> A C B B B A B A \. "
The next increase in complexity would involve trigr
frequencies but no more* The choice of a letter wc
depend on the preceding two letters but not on the
text before that point. A set of trigram frequonci
13-
p(i,j,k) or equivalently a set of transition prob:
bilities Pjj(k) would bo required. Continuing in
this way one obtains successively more complicate;
stochastic processes. In the general n-gram case
a set of n-gram probabilities p(i^, ig, • in)
or of transition probabilities p, , ^
11 H> Vl
is required to specify the statistical structure,
(D) Stochastio processes can also be defined which prt
duce a text consisting of a sequence of "words. "
Suppose there are 5 letters A, B, C, D, E and 16
"words" in the language with associated probabilii
' .10 A .16 BEBE - .11 tJABED - 3 .04 DEB
' .04 ADEB • .04 BED . . .05 CEED , »15 DEED
' .05 ADEE • .02 BEEP - 3 .08 DAB ' V >• 01 EAB
*: .OX BADD • .05 CA * .04 DAD" v ? i .05 EE ^
Suppose successive "words" are cndseii Independent:
and are separated by a space. A typical message
might be:
DAB EE A BEBE DEED DEB ADEE ADEE EE DEB BEBE BEBE
BEBE ADEE BED DEED DEED CEED ADEE A DEED DEED BEBI
CASED BEBE BED DAB DEED ADEB
If all the words are of finite length this process
is equivalent to one of the preceding type, but t:
description may be simpler in terms of the word
structure and probabilities. We may al3o general:
here and introduce transition probabilities betwee
words, etc., ^ I, -
• .>. " i
These artificial languages are useful in construe
simple problems and examples to illustrate various posslbil
V£e can also approximate to a natural language by_ moans of c
series of simple artificial languages* The aero order appr
mation is obtained by choosing all letters with the seme pr
bility and Independently. The first order approximation is
obtained by choosing; successive letters independently but e
letter having the same probability that, it does in the natu
language,. .Thus in the first order approximation to English
is chosen with probability .12 (its frequency in. normal Eng
and W with probability .02^'but there is no influence betwe
adjacent letters and no tendency to form the preferred digr
such as.TH, .ED, etc. In the second order approximation dig
structure is introduced. . 'After a letter is chosen, the nex
one is chosen in accordance with the frequencies with which
the various letters follow the first one. This requires a
table of digram frequencies p^(jj, the frequency with which
letter j follows letter i. In the third order approximatio:
trigram structure is introduced. Each letter is chosen wit
probabilities which depend on the preceding two letters.
3. The Series of Approximations to English
To give a visual idea of how this series of proce;
approaches a language, typical sequences in the approximate
to English have been constructed and are given below* In a:
cases wo have assumed a 27 symbol "alphabet t ho 26 letter;
and a space. - " ,.,
1. Zero order approximation {symbols independent and equ:
probable);-'.-, * •'•^./,. ' ' '■, \. ." t
XFCKL RXKHRJFF JUJ ZLPWCFWKErW FFJEYVKCQSGXYB
QPAAMKBZAACIBZLHJQD •
2. First order approximation (symbols independent but wit
frequencies of English text). y
OCRO HXI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHT.
\ OOBTTVA NAH BRL
3. Second order approximation (digram structure as in En(
OK IE ANTSOUTINYS ARE T INC TORE ST BE S DEAMY ACHIN D
ILCNASIVE TUCOOVSE AT TEASONARE FUSQ TlZIN ANDY TOBE
SEACE CTISBE "
4. Third order approximation (trigram struoture as in Eng
IN NO 1ST IAT WHEY CRATICT FROURE BIRS GROCID PON DEN OL
OF DEHONSTURES OF THE REPTAGIN jIS REGOACTIONA OF CRE
5m 1st Order Word Approximation." Rather than continue wi
. . • tetragram, n-gram structure, it is easier and bett
to jump at th^a point to ..word units. Here words are
chosen independently but with their appropriate fro que
REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN
DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO
EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE
THESE. -
6. End Order Word Approximation. The word transition
probabilities are correct but no further structure is
included,
THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER
THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER
METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLL
THE PROBLEM FOR AN UNEXPECTED
The resemblance to ordinary English text increase
quite noticeably at each of the above steps* Note that the
samples have reasonably good structure out to about twice t
range that is taken into account. in their construction* Th
in (3) the statistical process Insures reasonable text for
letter sequence, but four-letter sequences from the sample
usually bo fitted Into -good sentences,. . In (6) sequences of
or more words can easily be placed in sentences without unu
or strained constructions > Tfio particular sequence of ten
words "attack on att- English writer that .the charaoter of th
Is not. at all unreasonably. *»^*** • '-- ^ ^
The first two samples were constructed by the use
a book of random numbers in conjunction for (2) with a tabl
of letter frequencies. This method might have been continu
for (5), (4), and (5), since digram, trigram, and word freq
tables ore available, but a simpler equivalent method was u
To construct (3) for example ono opens a book at random and
selects a letter at random on the page. This letter is re-
corded* The book is then opened to another page and one re
until this letter is encountered. The succeeding letter is
then recorded. Turning to anothor page this second letter :
searched for and the succeeding letter recorded, etc* A si:
process was used for (4), (5), and (6). It would be lnterc
if further approximations could bo constructed, but the lab
involved becomes enormous at the next stage* • ,
The stochastic process 6 is already sufficiently c
to English for many cryptographic purposes since most crypt-
analysis is based on "local" structure of not more than two
three words in length.' . ' ~
. - ■ . :; s ; • .
4*. Graphical Representation of a Markoff Process
Stochastic processes of tho type described above r
known mathematically as discrete Karkof f processes and have
been extensively studied in the literature** $ho general ci
ysi-: .'A
* For a detailed treatment see M. Frochet, "Methods des fon
arbitraires. Theorie des enSnements en chaine dans le ca:
d'un nombro fini d'etats possibles." Paris, Gauthier-Vill
1938. ~
16 -
can be described as follows. There exist a finite number c
possible "states" of a system; S1, Sg, . .., Sn» In additic
there is a set of transition probabilities; q^j) the probe.
bility that if the system is in state S± it will next go tc
state Sy To make this Markoff process into a language ger.
tor we need only assume that a letter is produced for each
transition from one state to another* The states will corr
spond to the "residue of influence" from preceding letters.
The situation can be represented graphically as s
in Figs. 1, 2, 3 and 4. . The "states" are the junction poir.
in the graph and the probabilities and letters produced for
transition are given beside the corresponding line. Fig. 1
for the example B in Section 2, while Fig, 2 corresponds tc
example C. In Fig. 1 there" ijs only ono stato since success
letters ere independent* In Fig» 2 there are as many state
as letters. If a trlgram example wero constructed there wc
be at most n states corresponding to the possible pairs of
letters preceding the one being choson. Figs. 3 and 4 shov:
graphs for the case of word structure in example D. In the
S corresponds to the "space" symbol. In Fig. 3 each word h
a separate chain of branches from the left to the right juii
point, while in Fig. 4 the branches have been combined, sic
fying the graph.
5. Puro and Mixed Languages
As we have indicated above a "language" for our p
poses can be considered to bo generated by a Markoff proces
Among the possible discrete Markoff processes there is a gr
with special properties of significance in cryptographic wc
This special class consists of the "ergodic" processes and
shall call the corresponding languages "pure languages." A
though a rigorous definition of an ergodic process is somev;
involved, the general idea is simple. In an ergodic proces
every sequence produced by the process is the same in stati.
tical properties. Thus the letter frequencies > digram fre-
quencies, etc.",- obtained from particular sequences will, as
lengths of the sequences increases, approach definite limit,
independent of the particular sequence. Actually this is n
true of every sequence but the sot for which it is false ha;
probability zoto. Roughly the ergodic property means, stati;
tical homogeneity, -
. « - • ••• • / - --iV-r ,
v (' - " . . .
All the examples of artificial languages given ab
are pure, the corresponding Markoff process being ergodic.
This property is related to the structure of the correspond
graph. If tho graph has two properties the language it gen
will bo pure. These properties ore:
1. The graph cannot be divided into two parts A and B su
that it is impossible to go from junction points in r.
A to junction points in part B along lines of the gra
in the direction of arrows and also impossible to go
from nodes in part B to nodes in part A,
2. A olosed series of lines in the graph with all arrows
on the lines pointing in the same orientation will be
called a "circuit." The "length" of a circuit is the
number of lines in it. Thus in Fig. 4 the series BEE
is a circuit of length 4. The second property requir
is that the greatest common divisor of the lengths of
all circuits in /the graph be one, : \ -
If the first condition is satisfied but the secon
one ( violated by haying the greatest common divisor equal to
d > 1, the sequences have a certain type of periodic struct
The various sequences fall into d different classes which a:
statistically the same apart from a shift of the origin (i..
which letter in the sequence is called letter 1) V» By a shi:
of from 0 up to d - 1 any sequence can be made statisticall
equivalent to any other. A simple example with d = 2 is th-
following. There are three possible letters a. b, c. Lettc
a is followed with cither b or c with probabilities ± and £
3 3*
respectively. Either b or o is always followed by letter a
Thus a typical, sequence is
abncacacabacababacac. .
This typo of situation is not of much importance for our woi
If the first condition is violated the graph may 1
"separated" into a set of subgraphs each of which satisfies
first condition. We will assume that the second condition 2
" also satisfied for. each subgraph. We have in this case what
may be called a ''mixed" language made up of a number of pure
components. . The components correspond to the various subgrc
If **1» ^3* D:ce ^ne component languages we may write
> t - p^ ♦ p^2 * p3% ♦ *y->f\
where pA is the a priori probability of the component langut
• ■ - j .
Physically the situation represented is this. The
are several different languages 1^, 1^, Lj, which are e
of homogeneous statistical structure (i.o., they are pure
languages). We do not know a priori which is to be used, bu
once the sequence starts in a given pure component it cor.
- 18 -
indefinitely according to the statistical structure of that
component. Wo do havo, however, a set of a priori probabilities
for tho various components, p^, pg, .
As an example one may take two of the artificial
languages defined above and assume p^ = .2 and p2 » .8. A
sequence from tho mixed language
L » .2 1^ + ,.8 Lg
would be obtained by choosing first or Lg with probabilities
.2 and .8 and aftor this choice generating a sequence from
whichever was chosen* -
A natural language, such as English or German, is
not, of course, pure. Different kinds of text, literary,
newspaper , technical or military, display consistently differ-
ent types of structure. Those differences are small, however,
in comparison with the differences -between different natural
languages. If only local structure— letter, -digram and trigram
frequencies, for instance — is of much importance, it is reason-
able to consider "normal English" to be nearly pure.
6. Information Rate and Redundancy of a Language
Suppose we have a pure language L produced by a given
Markoff process. Associated with the language there are certain
parameters which are of significance in questions of trans-
forming the language and in cryptography. The most important
of these is what we will call the "information rate" R for the
language. It measures the rate at which the Markoff process
"generates information," as determined by the measurement of
the amount of choice available on tho average per letter of
text that is produced. In Section 1 we deflnod the amount of
choice when there ore various possibilities with probabilities
Pl» P2i *V, Pn as
H ■ ■ 2 log Pi •
In a Markoff process with a number of different ^states" there
will be a choice value ft^ for each of these states and a proba-
bility of being in each of the states (or a frequency with which
this state occurs)* If this relative frequency for state i is
P*, the average amount of choico Is
R - Z Pi ^
summed over all the states. This is tho definition of the
information rate for the language. If p^(j) is the probability
of producing letter J when in state i we have
^ -2 Pi(j) log Pi(jJ
the sun being over all tho letters in the language. Thus
R - Z Pt Pitj) log ptU)
Tho infornation rate R has the units of alternatives
(or digits) per letter sinoe it neasures the average amount of
choice por letter of text that is produced,
. A second parameter of importance is. the "maximum rate"
RQ for the source. This is defined simply as the logarithm of
the number of different letters in the language. RQ is also
measured in alternatives or digits per letter. If * successive
letters are chosen independently and each letter is equally
likely RQ « R. Otherwise we have R < RQ.
R and RQ are actually two limiting cases of informa-
tion rates for the language. R may be said to be the rate
when no statistical structure is taken into consideration and
R is the rate when all the structure is taken into account.
Between these there is an infinite series of rates R*f- Rg,
RQ, ••• which take some of the statistical structure into
account. R^ takes the letter frequencies into account and is
defined by
% « L p(i) log p(i)
.. - *
where p(i) is the probability of letter i. R2 takes digram
structure into account and is def inod by
R2r-2 p(I)'p1(J) log Pl(J)
where the p(i) are letter probabilities and pjJJ) the ^transition
probabilities, i»e., tho probability of letter i being followed
by letter J; In general we define
*n "Z P<*i» h* W Piifg V d(in)
lOg P± 4 * (i_)
X\H *n-l n
where tho sum is on all indices i, , • i_ and p< • •• .
1 ^ .'I 1n-l
is the probability of (n-1) gram i-^ •*» i^^ with
pi ^n^ tho I^^abillty of this n-1) gram being folio;
1 n-1
by letter i^. ^ may be called tho n-gram information rate fc
the language. It can be shown that
. Ro>Rl>R2 ^ Roo "R
These rates determine how much a language /can be "compressed"
in length by a suitable oncoding process* A language with
maximum rate Rq and rate R can be transformed in such a way
that a sequence of letters N letters long is transformed into
a sequonco of letters only N* letters long where
IV RA « N R
(This is approximate and only exactly true in the'limit as
N -+ oo .) Thus tho information is "compressed" in th6 ratio
R
This is the greatest compression ratio possible. It makes use
of all the statistical structure of the language. If only
n-gram structure is made use of, a compression ratio
is the best possible.
The compression obtained in this way is only a
statistical gain. Some infrequent sequences are encoded into
much longer sequences while the more probable ones go into
shorter sequences so that on the average the length is de-
creased. It is the type of compression obtained in telegraphy
by using the shortest telegraph symbol, a single dot, for the
most froquont letter E, while uncommon letters Q, Z, etc, arc
encoded into longer telograph symbols. An average reduction
in time of transmission is obtained but there are possible
soquencos, e.g., Q Q Q * » t, which require much longer*
_» ■ ■ •
Performing 'a transformation on a language L which
compresses as much as possiblo will be called reducing t to
a "normal" form. When this has been done it can be shown
that all letters in the output are equally likely and inde-
pendent. Actually to realize this transformation would usuall
21 -
r>nT TTT IHF1 TTXj "I
require an infinitely complex machine, but we can always ap-
proximate it as closely as desired, with a machine of finite
complexity.
Tho quantity
D = RQ - R
will bo called the redundancy rate of the language. It meas
the excess information that is sent if sequences in the lang
arc transmitted in their original form (without compression
reduction to normal form). Correspondingly thero is a whole
series of redundancy rates:
Do - Ro - V
Dp - R, - R?
ej x m
D = R - R
n o n
D = Rc - R
is the redundancy rats due to n-gram structure in the
language .
The redundancy D can also be said to measure the
amount of statistical structure in the language. If the se-
quence is purely random D = 0 whilo at the other extreme if
each letter is completely determined by preceding letters wit
no freedom of choice, D has its maximum" possible value RQ. 3
is sometimos convenient to use the "relative" redundancy D/Rc
which must lie between 0 and 10C#. • ;
V
If we hnvo a source of rate R, maximum rate R (bot
in digits per letter) and consider the possible sequences of
letters these fall into two groups for N large. One group ol
"high probability" sequences contains about
10™
zz
sequencGS (where we have assumed R measured in digits per letter).
All of those have substantially the same logarithmic .probability.
The remainder of the total of 10*°* possible sequences are of
very small probability. In fact thoir total probability ap-
proaches zero as N increases . The logarithm of the probability
of an individual sequence in the high probability group is thus
about -RN. In a procise statement of these results we must allow
a certain fuzzincss in R, i.e., replace R by R ± e whore e -* 0
as N -* oo «
.
Reduction of a language to normal form is performed
by properly matching tho probabilities of sequences to the
length of the corresponding sequences in the normal form. The
"high probability" sequences are translated into short sequences
and tho remainder into longer sequences.
_ An example will clarify tho results we have given.
Let the language contain 4 lotters A, B, C, D. In a soquenoe
successive lotters are chosen independently, the four letters
having probabilities ^, ^, |, £, respectively. Vie have
rq m iog2 4-2 alternatives/letter
and
1 11 12 1
Rl * R2 " % " " R " " (2 log t + 4 loe 4 + 8 los 8"}
■
* I + I * I ** 4 alternatives/letter
By a suitable transformation the average length of sequences
can bo reduced by tho factor ^/2 - 7/8. A transformation to do
it is the following. First wo translate into a sequence of
binary digits (0 or 1 ) by the following table
A 0
B 10
- C 110
D 111
After this pairs of the binary digits aro translated into the •
original alphabot as follows
00 ' A1
01 B»
10 C»
11 D«
- 23 -
For a typical scquonco this works out as shown below:
AB CABAC BBDAA D A D A
0 10 110 0 10 0 110 10 10 111 0 0 111 0 111 0
Regrouping and translation back into letters:
01 01 10 01 00. 11 01 01 01 11 00 11 10 11 10
. B« B» C« B» A» V B' B« B» D« A* D« C» D' C
In this case there are 16 letters in the original and 15 in
final text. Thus due to the snail redundancy and the short
of the text only part of tho saving is; evident* . In a long
hoivever the full reduotion -of g would appear* , This nay be
verified directly in this cose. In a long text of N letter
each letter will appear with about its. appropriate* *requenc
Thus the nuriber of binary digits will be about
N[| • l + J-2+|«3+^-3] ■ J N
since each A gives one binary digit, each B gives two, etc.
nuriber of letters in the final text is half this since each
pair of binary digits goes into ono letter. Thus the re due
is by a factor Z .
0
It is also easy to seo in this case that the bina
digits are equally likely and independent, and fron this th
tho final text letters are also*
This situation is nore coriplicated for nixed long
and we shall not enter into it here* Wo nay note, however,
that if
L -jpfo* •'»•• ♦ PnIfc :
whore 1^ is pure with rate R^f then the long sequences of
fall into (n+1) groups^ The first n groups correspond to t:
pure conpononts. Thpse in gr oup 1 nunber about -
and have logarnithic probability about
24 -
^■'H M, || | |
Tho last group contains all other sequences and has a snail
total probability*
7, Redundancy Characteristic of a Language
The form of the curve D(N) as a function of N na;
called the redundancy characteristic of the language. In :
rough way it describes the way in which the redundancy appt
In Fig. 5 several types of characteristics are shown, all i
the same final redundancy. The way in which this approach
is of importance in cryptography. For languages which reac
final redundancy at one or two letters (Curves 1 and 2) one
of cipher (ideal ciphers) can be used. For those which rer
near zero out to fairly large N (like Curve 5) another type
appropriate. Natural languages are apt to show a character
more like 3, and this makes them difficult to encipher witi
security by simple means. ■ .
- Examples ;
1. A language in which successive letters are independer
but with different probabilities has a characteristic
Type 1.
2. Consider a language constructed as follows. First sc
268 different sequences of letters, each 16 letters 1
from tho 2616 possible sequences of this length. Th:
should be a random selection. The 16-letter sequence
chosen aro the "words" of tho language. Messages arc
random sequences of those "words." Such a language 1
a characteristic like the Curve 5,
3. A language with digram structure only, such as Exampl
in Section 2 above, has a characteristic of the Type
Fig. 5, reaching its final value at N = 2.
4. English has the characteristic 3 in Fig. 5.
■
The redundancy characteristic describes how the
structure in the language is spread out. If the structure
localized, tho curve rises rapidly to its final value. If
there are 'long range influences the asymptotic value is ap-
proached more, slowly. If the structure is "locally random"
the curve will romain near zoro for small N.
8. Secrecy Systems
Before we can apply any mathematical analysis to
secrecy systems, it is necessary to idealize the situation
suitably, and to define in a mathematically acceptable way
what v«e shall mean by a secrecy system. A "schematic" -diagram
of a general secrecy system is shown in Fig. 6. At the trans-
mitting end there are two information sources — a message source
and e key source. The key source produces a particular key from
among those which are possible in the system. This key is trans-
mitted by some means, supposedly not intercept ible , e.g. by mes-
senger, to the -receiving end. The message source produces a
messnge (the "clear") which is enciphered, end the resulting
cryptogram sent to the receiving end by a possibly interceptible
means, for example radio. At the receiving end the cryptogram
and key are combined in the decipherer to recover the message.
Evidently the encipherer performs a functional opera-
tion. If M is the message, K the key, and E the enciphered mes-
sage, or cryptogrrm, we have
I - f(M, K)
i.e. E is r function of M end $« We prefer to think of this,
however, not as n function of two variables but as n (one para-
meter) family of operations or trcnsforma tions , and we write it
E - T,M. .
The transformation T, applied to message M produces cryptogram E.
The index i corresponds to the particular key being used. If
there are m possible keys there will be m transforations in the
family Tg, ...... Tffi,
At the receiving end it must be possible to recover
M , knowing E and X. Thus the transform tions in the family
must have unique inverses
M - Tf 1 E
at any rate this inverse must exist uniquely for every E which
can be obtained from an M with key i.
The key souroe can be thought of as a "probability
machine," something which chooses from the possible keys ac-
cording 'to a system of probabilities. Mathematically then, the
keys (or the parrmeter of the family of transformations) belong
26 -
THiTijfjjiriirrTUT
to q probability or measure spree. Hence we r-rrive rt the
definition:
A secrecy system is o family of uniquely reversible
transformations T, of r message spree ^ into 0 cryptogam
spr.ce.Tl_,, the parameter i belonging to a probability spr.ee CL..
Conversely any set of entities of this type will be called a *
"secrecy system." . .
The system can be visualized mechanically as a
machine with one or more controls on it- ' A sequence of letters,
the message, is fed into the input of the machine and a second
series emerges at the output. The particular setting of the
controls corresponds to the particular key being used. Some
method must be prescribed for choosing the key from all the
possible ones*
To make the problem mathematically tractable we shall
assume that fthe enemy knows the system being used* That is, he
knows the family of transformations T,, and the probabilities
of choosing verious keys*
One might object to this as being unrealistic, in that
the cryptanalyst often does not know whet system was used or the
probabilities of vrrious keys. There are two answers to this
objection.
1. The resumption is rcturlly the one ordinarily used
in cryptogr-phic studies. It is pessimistic and
hence s-:fe, but in the long run realistic (particu-
larly in military work), since one must expect his
system to be found out eventually through espionage,
captured equipment, prisoners, etc. Thus, even when
an entirely new system is devised, so thot the enemy
crnnot rssign rny a_ priori probability to it without
discovering it himself, one must still live with the
expectation of his eventual knowledge, •
.
2. The restriction Is much weeker thrn appears at first,
due to our broad definition of what constitutes the
system. Suppose a cryptographer intercepts a message
and does not know whether a substitution, transposi-
tion, or Vigenere type cipher was used* He can con-
sider this' as being enciphered by e system in which
part of the key la the, specification of which of these
types was used, the next part being the particular
key for that type. These three different possibil-
ities are assigned probabilities according to his
best guesses of the a priori probrbilit ies of the en-
cipherer using the respective types of cipher.
- 27 -
cwiui' mum
A second possible objection to our definition of
secrecy systems is that no account is taken of the common
practice of inserting nulls in a message and the use of mu
tiple substitutes. Thus there is not a unique E ■ T, M, t
actually the encipherer can choose at will among a number
different E's for the same message and key. This -situatic
could be handled, but would only add complexity at the pre
stage, without altering any of the basic results. To defi
the more general secrecy system, one would add a second pa
meter to the transformations T,, which corresponds to the
various choices of cryptograms corresponding to a given me
sage and key. It is possible, but not always desirable, t
consider this second parameter as part of the key, since i
does not need to be transmitted to the receiving point.
We elsO assume that the enemy is in possession o
measure in the space 0M, the a priori probabilities of var
messages. The same object ion"~and essentially tho same ans
might be given to this assumption as to his knowledge of t
transformations T*. This measure, however, we do not cons
rs part of the secrecy system for reasons which wITl apper
later. The secrecy system whose transformations are T. wi
be denoted by T and this concept includes the space or.
which T operates (without its measure ), the trans formation
r-nd the spaces Ojr and "i^,, the former with its probabili
measure.
If the messages are produced by ? M-rkoff proce?
of the type described previously, the probabilities of vrx
messages are determined by the structure of the M^rkoff pr
For the present, however, we wish to t^ike a more general t
of the situation rnd regard the messages as merely an abst
set of entities with associated^. probabilities , not necess'
composed of a sequence of letters and not necessarily prod
by a M^rkoff process.
It should, be emphasized that throughout tne pape
secrecy system means not one but a set of many transformat
After the key is chosen only one of these transformations
used and we might be led to define a secrecy system as a s
transformation on a language.* The enemy, however, does r.
know what key was chosen and the "might have been" keys ar
important for him as the actual one* Indeed it is only tfc
exi stance of these other possibilities that gives the syst
*A. A* Albert in a paper presented at a Manhattan, Kansas,
meeting of the American Mathematical Society (Nov. 22, If
• entitled "Some Mathematical Aspeots of Cryptography has
defined a ciphering system in this way. With this limite
definition about all one can do is to describe and class;
from the mathematical point of view various types of trar
formntions.
28 -
any secrecy.' Since the secrecy is our primary interest,
are forced to this rather elaborate concept of a secrecy
system. This type of situation where possibilities are t
important as actualities is almost the rule in games of
strategy. The course of a chess game is largely control!
by threats which are not carried out. See also the "vir:
existence" of unrealized imputations "in von Neumann's the
of games.
There are a number of difficult epistemologica 1
questions connected with the theory of secrecy, or in fac
with any theory which involves questions of probability
(particularly a priori probabilities. Bayes* theorem, etc
when applied to a physical situation. Treated abstractly
probability theory can be put on a rigorous logical basis
with the modern measure theory approach** As applied to
reality, however, especially when "subjective* probabilit
and unrepec table experiments are concerned, there are mar.
questions of logical validity. For example in the appror
to secrecy made here, a priori probabilities of various k
are assumed known by tEe enemy cryptographer — bow can one
determine operationally if his estimates are correct, on
basis of his knowledge of the situation?
It may happen thrt the keys are chosen by the
cipherer according to one system of probabilities, i.e. c
measure in the key space 0„ nnd that the enemy cryptanaly
estimates a second different system of probabilities fl£ i
this space which ere entirely reasonable in the light e
his knowledge of the situation — which is correct? I be
lieve that both a.re correct.' The calculation besed on Clj,
leads to the solution when the enemy knows just how the
keys pre chosen r nd the solution .based on ^ leads to sol
tions which are correct for a situation agreeing with the
enemy's knowledge of the actual situation. It rppears in
tuitively that the enemy's lock of knowledge can only do
him harm, and probably this can be proved, but this quest
has not been investigated* In fact, we assume only one
measure ^ in the key spaoe* Similar remarks may be made
regarding measure in the messrge space Ow.
*See J» L. Doob, "Probability as Measure," Annals of Math
Stat .\ v, 12, 194J., pp.*206-2U.
A.. Kolmogoroff , "Grundbegrif fe der W^hrscheinlichkeits
Rechnung," Ergebn'isse der Mr.thenetic, v,2, No* 3 (Berlin
1933). -
- 29
\QlifT"rnrTTTTrr
Actually In practical situations, only extrec
errors in P priori probabilities of keys and messages cau
much error""in the important parameters. This is because
the exponential behavior of the number of messages, etc,
and the logarithmic measures employed.
With regard to the application of the m^ theme
theory of probability to physical situations there are tv.
main theories or ways of setting up the correspondence.
The frequency theory- .Probability is correlated with re
frequency of an event* .This Is the correspondence used t
the practicing statistician, in principle by the physic is
etc. (2) The degree of belief approach. .Probability is a
subjective phenomena and measures one's degree of belief
the occurrence of on event* .This approach is seen often
the work, of historians, Judges, and in everyday life. Al
though this latter approaoh has of ten been attacked as me
less we cannot agree with this opinion. In the first pie
the intuitive approach can be given a rigorous mothematic
f«tuv4stion» . This has been done in * very elegont way by
B. 0. Koopmen.* Essentidly one need only assume that a
be capable of making probability judgments (Event A is m:
less probable than event B or they are equiprobable) and
his judgments be self consistent (e.g. if he judges A mor
probable than B end B more probable than C he should jud£
more probable than C). One can even establish numerical
by the use of a "standard gauge," for example a roulette v,
and thus relnte the subjective and the frequency probabil
In the second place, on progmatlc grounds one can hardly
the subjective applications , since almost all of our ever
decisions are based on this sort of probability judgment.
Cryptographic work involves both types of applications,
the use of frequency tables, significance tests etc., the
crypt-nalyct is following the frequency approach. In th
"intuitive" methods of cryptanalysis (probable words etc
degree of belief approach is more- in evidence* »
We may remark that e single operation on a
language which is reversible forms a degenerate type of e
system under our definition— a system with only one key r
unit probability- Such a system has no secrecy — the cryi
analyst finds the message by epplying the inverse of this
transformation, the only one in the system, - to the interc
cryptogram* The decipherer and. cryptanaiyst in this case
*B. 0. Koopman, "The Axioms and Algebra of Intuitive
Probability," Annals of Mathematics, v. 41, no. 2, 1940,
p. 269. "Intuitive Probabilities and Sequences," v. 42,
no.l,. 1941, p. 169.
- 30
fiflPr I IT I l
possess the ssme inf ormation. In gonerr.l, the only differ
between the decipherers knowledge on3 the enemy cryptanal
knowledge is that the decipherer knows the pnrticul^r key
used, while the cryptanalyst only knows the b priori pr->bc
ities of the various keys in the set. The process of deci
ing is that of applying the inverse of the particular tror.
formation used in enciphering to the cryptogram. The proc
of cryptenalysis is that of Attempting to determine the me
(or the particular key) given only the cryptogram find the
a priori probabilities of various keys and messages *
A system will be celled fc^oaed" if any possible
cryptogram can be deciphered with any possible key. This
that the inverse transformations T~l are ell defined for e
element in the cryptogram -spaoe. 1
7/e shPll use the notation |m| for the "size" of
message space: ; ../
X* • ImI- *•£ P(M) log P(M)
where P(M) is the probability of message M end the sum is
all messages of just N letters. Thus \U\ is a function of
and measures the amount of "choice" in the selection of an
letter message. F or large N, |M| is approximately RN.
Similarly Ik] is the size of the key space
IkI - - 2 P(K) log P(K)
the sum being oyer all keys.
9. Representation of Systems
^ A secreoy system can be represented in various
One which is convenient for illustrative purposes is a lin
diagram, as in. Figs. 7, 10, 11. The possible messages are
represented by points at the left end the possible cryptog:
by joints at the right. If;a certain key, say key 1, tran
forms messnge Mg into cryptogram E . then M« and E. are con-
nected by a line ilabeled lf etc» From eacn possible messn
there must be exactly one line emerging for epch different
t
A- second representation is by means of a rectant
array. This may be done in three different ways* For the
closed system of. Fig. 7, the three arrays are as follows:
- 31 -
M3
Ma
V
K
m\. 1
El E4 E2
E3 El E4
E4 E3 E1
E2 E2 E3
^1
M.
M4
E» Eo. E
2 3 4
. K
1
2
3
1,2
1
2
3
2
3
1
E \
1
2
o
El
Ml
%
E2
M4
M4
E3
Mfi
K4
E4
id3
%
transforms % Into E-z and either ?^£Vjt0 E§ by key 3* No
From the third E3 is^e^ipherel hi kL Vf^H M4 ^to Sa.
arrays and the l?ne diagram contain !Lf *? gfVf M3' A1* ofSthese
any one the others can be derived, equivaleGt informs tion-from ,
' * . . • > • ^ • _ . • *• .
transform^^in^ describe the set of ^
bilities of various ke?s mS; ai« £pec}fy tlle system the proba-
by merely listing the kevHftS be eivfn' This mW ^ done
Similarly the melsagl SSbl 1? not Probabilities"
the probabilities of the va^^^S •^.SSJ*1* ^
the set oAZsfor^oL8 W\e? 18 t0 desc1^
forms .on the message for an LhUl^ 8t °Per,2tions one per-
grsm. Similarly one d??iJes f X 6Lto ybtr-in the crypto-
various keys by describing how Tklv £ Probabilities ?™ .
of the enemy's habits of kJv- ilh««f 7 ^ ohosen, or what we know
messages are Implicit detL^0 The Probabilities tor
knowledge of tha e^mvL ? ined by stating our a priori
tion (wflch will Since ^r^nh^^3' th* ^otToaTSfluB, "
and any special inSiVwl fi^Es
. ,«ajr uave regarding the cryptogram.
10. Notation
M
K
E
V
The following notetioa „m generally be followed,
the encipher&d message or cryDtourrm
t%Zll&&\Tctnls -S^SSW probabilUlee, . ^
SbXi^W* ProbaMlitles. also 4
3 » the cryptogram space, also a probability space, sine-
the probabilities in 3L, and induce probabilities
CL/.for each cryptogram,
th
m, ■ the i letter of the message
e^ * the i'tti letter of the cryptogram
k^ « the itn letter of the key when it can be so describe
Generally P stands for a probability- Conditional
probabilities are indicated with subscripts; Thus
P(M.) " probability, of message M
P(E) ■ probability of cryptogram E
P(K ) <■ probebility of key K . •
PM(E) - conditional probability of ,E if message M is chos
Eg(M) :'.» conditional probability of if cryptogram E is
intercepted,- i*e# the a posteriori probability of
• if E Is observed* " " O' , * ■ ■
Q * equivocation, a concept to be defined precisely It
which measures the uncertainty of some ~ knowledge c
fined only by probabilities. We also hr>ve condit
equivocations, thus Q^(K) is the equivocation of ■
key knowing the message.
|k| « - L P(K) log P(K) the size of the key space
\n\ •» - E P(il) log P(M) the size of the message space
[e| • - E P(E) log P(E) the size of the cryptogram space
m * number of different keys
N * number of intercepted letters
RQ » mr-ximum information rate for a language
R « mean rate
JX * R 0 - R ■ redundancy of a language
T, R, S, etc. ■ secrecy systems
T*, R»« S,, etc* » particular transformations of these
systems
11 *
Some Examples -of Secrecy Systems
In this section. a number of' examples of ciphers ^
be given* These will' often be referred to in the remeinde:
the paper for illustrative purposes* " ; * '
'. " ' ■
1. Simple Substitution Cipher.
'■ \ -,.
In this cipher each letter of the message is repl
by a fixed substitute, usually Elso a letter.' Thus the me:
M *. m^ nig m^ m4 » . .
* 33 *
be cranes
el e2 3 4
K*S^S«« x'u ?he IbstttuiV AT 0 is the substitut
for B., etc* " • v. , • .. . »
2, Transposition {Fixed Period dV • - V
The nessr.ee is divided into groups of length d-.nd a
the second group, etc\r!?*P*??£ first d integers- Thus fc
that mx m2 m3 m4 ag m6 nig m10 oeco
^ ^ m5 n4 m? ^ *6 ^ mg ... 4 Sequential npplic*
tion of two or mor, transpositions will be c.Ued compound
imposition. If the periods are *1^V 1 Stow d i.<
thrt the result is a transposition of perioa a,
the least comon multiple of dg, d3, V v
3. Vigenere, rnd. Variations* ■
In this cipher the key consists of a series of d
A « 0 to Z - 25). Thus
e^, » <* fc^ i mod 26} J
where k« is of period d in ithe Index U \f
For example with the key G A H we obtain
message N 0 W I S T H E <* , - .
repeated key G A H G AH G A # * *
cryptogram _ T 0 D. 0 SANE-***
The Vigenere of period \}« •^^"5" xs'alvonced a'
»em^^
may be any number from 0 to 25. The so oexxe* o
- 34 -
V-ri^nt Beaufort r,re similrr to the Vigenere, end encipher by
the equations
el * ki - (mod 26)
ei * mi " ki ^mod 26 ^
respectively. The Be°,ufort of period one is called the
reversed Caeser cipher. .
The application of two or more Yigenfires in sequence
will be called the oompound Vigenere. ' It has the equation
... * j ,
ei * mi + kl * *i **** * *i (mod
' . • • . . . > - ■'«- . .... , , - v.,,.. :- • •
where 1^, *.., in general have different periods P
• • •' ' "'>'•■ •' ■ ■■ '■ . n&; '/ • • ■
The period of their sum • «
< . * * * «
ki + *i + * si
as in compound transposition, is the least common multiple of
the individual periods.
4. Vernam System**
When the Vigenere is used with an unlimited key,
never reperting, we h°ve the Vernam system, with
ei * mi * ki ^mod
the k, being chosen at random and' independently among 0, 1,
25. If the key is a meaningful text we have the "running
key" cipher.
. • '
5. Bazeries Cylinder.
. ,>.'■-■- •• ■„ ; • 'j • • » -v ' ,..«•■<
In this mechanical system 25 thick disks are used, -
each having a mixed alphabet stamped around the edge. These
disks can be arranged in any order on.a spindle,' and the par-
ticular arrangement used constitutes the key.' With the disks
in their proper order; a message, is- enciphered by turning the
disks so that the message appears* on a,. line -.parallel to the
axis of the spindle* Any. other line of letters may then be
chosen for the cryptogram. 'To decipher^ the cryptogram is
arrenged on a line end- the decipherer looks for another line
which then makes sense. —
*G. S. Vernam, "Cipher Printing Telegraph Systems for Secret
Wire' and Radio Telegraphic Communications.'' Journal Ameri.
Inst, of Elect. Eng., Vj ,'XLVy p#, ! 109-115, 1926.
6, Digram, Trigram, rnd N-gram substitution.
Rather than substitute for letters one cnn substi
for digrams, trigr^ms, etc. Genercl digram substitution i
quires n key consisting of a permutation of the 262 digrar
It can be represented by a table in which the row correspc
to the first letter of the digram and the column to the se
letter, entries in the table being the substitutes (usuall
also digrams)*
7* Interrupted Key Vigenere. ,
The Vigenere and its variations can be used with
interrupted key* • The sequence of key letters is -started e
at irregularly spaced points* 7 Thus^ if the entire key sec
isXPGH* TRS> one can Interrupt irregularly to get
X .P OH F TI H X P Gfi ? lE'XPlPO » • •
The points of interruption can be determined in various wt
(1). Whenever a certain letter occurs in the clear »• (£).
Whenever a certain letter occurs in the cryptogram. (3.) /
interrupting letter, say J, can be reserved as a signal ar
the encipherer Interrupts the key at his discretion, (4).
signal is used end the decipherer loontes the interruption
by the appearance of meaningless text in the decipherment,
In place of starting the key again at ecoh. interruption or
can omit letters of it or reverse the direction of progrer
There ere many variations and combinations of these methoc
8. Single Mixed Alphabet Vigenere.
This is a simple substitution followed by a
Vigenere*
e^ » f (n^) + kj
• ■
The "inverse" of this system is a/Vigenere followed by sir
substitution'
e . ■» g(m4 * k«)
.1, i i .
mi r e"1 (ei} - ki ,
■
/
9- Vigenere with Progressing Key* •
The period of >> Vigenere ean be expanded by ndding n
fixed number t to the key pt e^.ch pppefrance — thus the n^h group
is enciphered by the equ-.tion
ei * mi + ki + nt
Also this can be vnried by adding t and s alternately to the
key, etc.
10. Matrix System**
*
One method of n gram substitution is to operate on
successive n-grams with a matrix having an inverse* The letters
are assumed numbered^ from 0 to 85, making, them elements of an
algebraic ring. From the n-gram m, ou r»* m of message, the
matrix a^j gives an n-gram of cryptogram < .
' n
e, • Z au a, i » 1, *t»,n
1 j=l 1J J
The matrix is the key, and deciphering is performed with
the inverse matrix. The inverse matrix will exist if and only
if the determinant la^. | has an inverse element in the ring.
11. The Playfair Cipher.
This is a particular typp of digram substitution
governed by a mixed 25 letter alphabet written in a 5 x 5
square. (The letter J is often dropped in cryptogrephic work-
it is very infrequent, and when it occurs can be replaced by I.)
Suppose the iey square is as shown below
LZQCP
A 0 N 0 U
RDMIf '?
K Y.S T S '
X B T E W - "•' — - ■
* - '
*See L. S» Hill, "Cryptography in an Algebreic Alphabet,1*
American Math. Monthly, v. 36, No,. 6t 1, 1929, pp. 306-312,*
Also "Concerning Certain Linear Transformation Apparatus of ^
Cryptography," v* 38, No. 3, 1931, pp. 135-154,.
- 3-i -
The substitute for a digram AC, for example, is the pair c
letters at the other corners of the rectangle defined by A
and C, i.e. LO, the L taken first since it is above A. II
digram letters nre on c. horizontal line as RI, one uses th
letters to their right DF; RF becomes DR. If the letters
on a vertical line, the letters below then are used. Thus
becomes UW. If the letters are the same nulls nay be used
separate them or one may be omitted, etc.
12. Multiple Mixed Alphabet Substitution.
In this cipher there are a set of d simple subst
tions which are used in sequence. If the period d is four
ml <m2 *i ffl4 m5 a6 ,,f
. ■• '
becomes
h[ml] f2{m2} f3(cl3) f4(m4) *11b5* f2(m6}
...
13. Autokey Cipher.
A Vigenere type system in vihich either the messr
itself or the resulting cryptogram is used for the "key" i
crlled an eutokey cipher. The encipherment is started wit
a "priming key" (which is the entire key in our sense) and
continued with the message or cryptogram displaced by the
length of the prir4ng key as indicated below with the prin
key COMET, The message used as "key",
MESSAGE . S E N D S U P L I E S ...
KEY -- — - COME 3.8 RiJD S UP
CRYPTOGRAM USZHLMTCOAYH
The Cryptogram us"ed as "key"* ' ;
MESSAGE SENDS UP'P LI E S ♦*"#."'
KEY . ' t O M E t U S 2 B t 0 H »».
CRYPTOGRAM u U3ZHL0 H*e"S TS
- 38 -
14. Fractional Ciphers*
In these, each letter is first enciphered into two
or more letters or numbers and these symbols are somehow mixed
(e.g. by transposition). The result may then be retranslated
into the original alphabet. Thus using a mixed 25 letter
alphabet for the key we may translate letters into two digit
quinary numbers by the table
0 12 3 4
. . 0 L Z Q, C P
1 AG NO V
2 R D M I F
3 K Y H V S
4 X B TEW ,
.-
Thus B becomes 41. After the resulting series of numbers is
transposed in some way they are taken in pairs and translated
back into letters.
15# Codes.
In' codes words (or sometimes syllables) are replaced
by substitute letter groups. Sometimes a cipher of one kind or
another is applied to the result.
*
12 ^ Valuations of Secrecy Systems
There are a number of different criteria that should
be applied in estimating the value of a proposed secrecy system
The more important of these are: '
1. Amount of Secrecy. '
There are some systems that are -perfect — the 'enemy
ls-no better off after intercepting any amount of material than
before* • Other systems, although giving him some information,
do not yield a unique "solution" to intercepted oryptograms* , -
Among the uniquely solvable systems, there are wide variations
in toe amount of labor required to effect this solution; end *
the amount , of material that must, be intercepted to. make the
solution unique, -
- 39- - mJH*H^B£RTE$L
2. Size of Key..
The key must be transmitted by non-interceptible
means from transmitting to receiving ends. Sometimes it must
be memorized. It is desirable then to have the key as small
as possible.
3. Complexity of Enciphering, and Deciphering Operations.
These should, of course, be as simple as possible.
If they are done manually, complexity lends to loss of time,
errors, etc. - If done mechanically,, complexity, leads to large
expensive machines. " " v
4. ; Propagation of Errors.
In certain types of secrecy systems an error of one
letter in enciphering or transmission leads to a large amount
of error , In the deciphered text* The errors are spread out by
the deciphering operation, c fusing the loss of much information
and frequent need for repetition of the cryptogram. It is
naturally desirable to minimize this error expansion..
5. Expansion of Message..
In some types of secrecy systems the size of the
message is increased by the enciphering process. This undesir-
able effect may be seen in systems where one attempts to swamp
out message statistics by the eddition of many nulls, or where
multiple substitutes are used. It also occurs in many "conceal-
ment" types of systems (which are not usually secrecy systems
in the sense of our definition).
15. Equivalence Clesses In the Key Space
It may happen that in a ciphering system two or nnre
different keys, say keys 1,. 2, and 7, are equivalent. -By this
we meen that for every M ~ J
■> ■C^m"-i - . ■ - , . •
, ' ••' •. ; - > ■ — V '
■ . , ' ' ' . , " . ■ Av . ■ ^ ' "■
These keys will not be considered as distinct but will be thrown
into an equivalence class*. It is >clear that the cryptanalyst
oan never determine whioh particular one of these was used but "
only {at test) the class.. The probability for the class is of
course the sam of the probabilities of the different keys in ' :
the class.-
As an exemple, in- the Playfair cipher with the s;
given above, the following are equivalent key squares.
GHXPY X C I 2 T
Z F E C.I JB'Dl.O
LONRD V S <} T A
T A V S Q t W B MK U
K U W B M IP Y GH
We can think of the possible equivalence classes in this c
as arrangements of a 25 letter alphabet on a 5 x 5 square
on an oriented torus. The number of different .keys is not
but 251/52 - 241
• .
" When vie say that two seorecy systems are the sam
mean that they consist of the same set of transformations
with the same message and cryptogram space (range and dome
and the same probabilities for the different keys (after e
identical transformations are put in .the same equivalence
class).
14. The Algebra of Secrecy Systems
If we have two secrecy systems T and R we cen of
combine them in various ways to form a new secrecy system
If T end R heve the same domain (message space) we may for
kind of "weighted sum,"
S ■ p *T ♦ q
where p * q - 1. This operation consists of first making
preliminary choice with probabilities p and q determining
whioh of T end R is used. This cholse is part of the key
After this is determined T or R is used ns originally defi
The total key of S must specify which of T and R is used e
which key of T. (or R) is used* v
■ ,
If T consists of the transformations T^.t 1
with probabilities pv, Pm end R consists o=f R,f ...
Rv with probabilities q,„ qk then S « p T * q R cons
of the transformations Tp, T^ "•— , T , Rr, Rfc wit^
probabilities pp,., ppg, • PPa, qqx» Sfagi • qqk
respectively*
- 41 -
More generally we c^n form the sum of a number
systems.
S = P1T+p2R+... + pmU Sp1 - 1
We note that any system T can be written as a sum of fixed
operations
T " pl Tl + p2 TS + + pm Tm
Tj being a definite enciphering operation of T correspond!:
key choice i, which has probability pf«
A second way of combining two secrecy systems is
taking the "product", shown schematically in Fig. 8. Suppr
T and R are two systems and the domain (language space) of
can be identified with the range (cryptogram space) of R.
we can apply first R to our language and then T to the resi
of this enciphering process. This gives a resultant operat
which we write as a product '
S - T R
The key for S consists of both keys of T and R which are as
ohosen aocording to their original probabilities and indepe
ly. Thus if the m keys of T are chosen with probabilities
pl p2 pm
and the n keys of K have probabilities
pl p2 pn
then S has mn keys (at most; there may and often will be
equivalence classes) with probabilities- p. pl. This type c
product encipherment is often used; for J example one
follows a substitution by a transposition or a transpositic
by a Vigen£re, or applies a code to the text and enoiphers
jte*, result by substitution, transposition, fractionation, etc»
k\ - A more special type of product may be defined in
case both T and R have keys of the 3cme size which may be f
rw in one-to-one correspondence with the same probabilities fc
corresponding keys. This may be called the "inner product,
in oontrast with the above which may be more completely de-
scribed as an "outer product" (these names are derived froir.
a rough analogy with the concepts of tensor analysis). In
the inner product, written
'\ S m T °R
■
- 42 - Q&ffSBEMTtcT
r.nd indicated scheme tically in Fig. 9, the same key (or corr-
spending keys) are used for both T end R chosen with the com
probability*
For exr-nple one nay construct e transposition cip:
whose key is a permutation of the alphabet, each permutation
being equally likely, and apply first this and then a substi"
tion based on the same permutation. One also sees this situ:
tion in certain geometrical types of transposition ciphers
where the text is written into a square and a permutation ba.
on a key word applied first to the columns and then the r
of the square,
* It may be noted that multiplication (either kind)
not in general commutative, (we do not always have BS"SB
although In special cases such as substitution and transposi*
it is. Since it represents an operation it is def initionall;
associative. That is R(ST) - (RS) T * RST,. Furthermore we !
the laws \ ' ' , '
p (p» T+ q' R) + qS * p p' T + p qT R + q S
(weighted associative law for addition)
T(pR+qS)«pTR+qTS
(PR+qS)T-pRT+qST
(right and left hand distributive laws)
and
Pl T + p2 T + ?3 R - (px + P2) T + P3 R
Finally with regard to this algebraic structure of
secrecy operations, we note that every closed secrecy system
has an "inverse" T1 obtained by Interchanging the E end M
spaces, with key probabilities the s*me, and
\T R S)» - S* R» T*
(p T + q R)* - P V ♦ q K*% - ,
' ...<_
Note that T T' is not in generel the -identity (this is the
reason we do not write T**+)» . -<
■■■ y.t: I . . - . . -
A system whose M and E spaces can be identified,
a very common oase as when letter sequences are transformed
into letter sequences, may be termed endomorphic* An endo-
morphic system T may be raised to a power Tn»
- 43 -
A secrecy system T whose outer product with itsel:
is equal to T, i.e. for which
T T ■ T
will be called idempotent. For example simple substitution
transposition of period p, Vigenere of period p (all with e
key equally likely) are idempotent.
The set of all endomorphic secrecy systems deflnec
a fixed message space constitute an "algebraic vrriety," th
is, a kind of algebra, using the operations of addition and
multiplication. In fact, the properties of addition and mu
plication which we have discussed lead to the following res
Theorem 1: The set of endomorphic oiphers with the same
message space and the two combining operations
of weighted addition and ouster multiplication
from a linear associative algebra with- a unit
element, apart from the fact that the
coefficients in a weighted addition must be
non-negative and sum to" unity*
It should be emphasized that these combining oper
tions of addition and multiplication apply to secrecy syste:
as a whole. The product of two systems TR should not be co
fused with the product of the transformations in the system
TjR,, which also appears often in this work. The former T
is a** secrecy system, i.e. a set of transformations with as-
sociated probabilities; the latter is a particular trans-
formation. • Further the sum of two systems p R + q T is a
system — the sum of two transformations is not defined. The
systems T and R may commute without the individual T, and R,
commuting, e.g. if R is a Beaufort system of a given perio
all keys equally likely,
Ri R 3 * RJ Ri'
in general, but of course RR does not depend on its order;
actually ^ • -
' -RR > v -vv-r ' ■■ •
the Vigenere of, the same period with random key* On the oti
hand, if the individual T. and E, of two systems T and R
commute, then the systems commute** " \~ \ -
. i.. .. • > ■ . . • •• -
It is rather surprising to find an algebraic varir
with as much structure as a linear associative algebra in w>
■
- 44 -
•the elements have the complexity of ciphers. In Hilbert space
theory, for example, one has a linear associative algebra,
but the elements of the algebra are transformations. Here the
elements are sets of transformations with a probability space
associated ■ ith the transformation parameter.
These combining operations give us ways of con-
structing many new types of secrecy systems from certain ones,
such as the examples given. We may also use them to describe
the situation facing a cryptanalyst when •attempting to solve a
oryptogram of unknown type. He is, in fact, solving a secrecy
system of. the type
T Px A + pg B * . . . . + Pr S + p* X Z p m 1
where the &f.B»>*t*i s are known types of ciphers, with the p«
their a priori probabilities in this situation, and. pf X
corresponds to the possibility of a completely new unknown type
of cipher*
' In weighted r.ddition the key size of the result is
given by
= p IK.J + q |K2I - (p log p + q log q)
= p Ik-J + q Ik2| ♦ |k3I
i.e. the weighted mean of the two keys plus the size of the
. p, q key* This is only in case there are no equivalences;
if there are it will always be less.
For the outer product the key size is
Ik II 1^ I ♦ |k2I
■•
with -equality only when there are no equivalences. In the
inner product
Ik! < |kx! - Ik2I
with equality under the same condition.
45 -
15. Pure and Mixed Ciphers
Certain types of ciphers, such as the simple sub
stitution, the transposition of a given period, the Vigene
of o given period, the mixed alphabet Vigenere, etc (all
with each key equally likely) have a certain homogeniety v,
respect to key* Whatever the key, the enciphering, deciph
ing and decrypting processes are essentially the same. Thi
may be contrasted with the cipher
PSMT
where S is a simple' substitution and T a transposition of
given period. In this case the entire system changes for
enciphering, deciphering and decryptment, depending on whe
the substitution or transposition was used*
The cause of the homogeniety %a certain ciphers
stems from the ^roup property — we. not! oe ' that in the above
amples of homogeneous ciphers the product of any two trans
formations in the set T, T, is equal to a third transforme
T,. in the set, while T1^1J does not equal any transformat
iB the cipher f
p S + q T
which contains only substitutions and transpositions, no
products.
We might define a "pure" oipher, then, as one wfc
T* formed a group. This, however, would be too restricti-v
since it requires that the E space be the same as the M si
i.e. that the system be end amorphic. The fractional trans
position is as homogeneous as the ordinary transposition v-
out being endomorphic. The proper definition is the folic
A cipher T is pure if for every Tj, Ty Tk there is a Tg s
that
Ti V1 Tk - V .
and every key is equally .likely. ' Otherwise the cipher Is
The systems of Fig. 7 are mixed. Fig- 10 is pure if all k
are equally likely.
r «♦'• - r--- . „i
Theorem 2: In a pure cipher the operations T. T, which
transform the message space into itselT form
group whose order is m, the number of differen
keys.
For
Y1 \ V1 tj " 1
so that e*iCh element has «n inverse, also the assoeiativ
law is true since these are operations, end the group
property follows from
using our assumption that T,-1 T,' - T . • T- for some s.
The operation T^-^T^ means, of course, enciph
the message with key j and then 'deciphering with key i w
brings us back to the message- spa'oe* , If T is endomorphi-
i.e. the T, themselves transform the space 0M into itsel:
is the case with most ciphers, where both the message sp
and the cryptogram space- consist of sequehoes of letters
and the T^' are a group and equally likely, then T is purt
since
■
Ti Y Tk • Ti Tr " Ts •
Theorem 3: The outer product of two pure c,iphers which c
mute is pure.
For if T end R commute ^ R^ - R^ Tm for every i, j with
suitable £, m, and
. . ■ . -
The commutation condition is not necessery, however, for
product to be a pure cipher* '
A system with only one key* a single defini
operation T^, is pure, since the only 'choice of Indices is
Tl Tl"1 Tl * Tl*
Thus the expansion of a general cipher into a sum of such
simple transformations also '.exhibits it as ft sum of pure
ciphers.
An examination of the example of a pure cipher
shown in Fig. 5 discloses certain properties. The message
fall into certein subsets which we will cell residue clas;
and the possible cryptograms are divided into correspond!:
residue classes. There is at least one line from er.ch mes
sage in a class to each cryptogram in the corresponding cl
and no line between classes which do not correspond. The
number of messages in a class is a divisor of the total
number of keys. The number of lines "in parallel" from a
message M to a cryptogram in the corresponding class is ec
to the number of keys divided by the number of messages ir
the class containing the message (or cryptogram)* It is s
in the appendix th?t these hold in generel for pure cipher
Summarized in a more formal statement we neve /
Theorem 4: In a pure system the messages can be divided i
a. set of "residue classes" C., C2, C„ and
the cryptograms into a corresponding set of
residue classes C' C' . .., C' with the folic
properties
The message residue classes are mutually
exclusive end collectively contain all
possible messages.. Similarly for the
cryptogrc-.ni residue classes.
Enciphering *ny message in C, with any ke
produces a cryptogram in CI. Decipherir.
any cryptogram in C! with any key leads
to a message in C^t
The number of messages in C. , say <p. , is
equal to the number of cryptograms
in C£ and is a 'divisor of k the number
of keys.
Each mrssnge in can be enciphered into
erch cryptogram in Ci by exactly. JL
different keys. Conversely qp. .
for decipherment. 4
(1)
(2)
(3)
(4)
- 48
The importance of the concept of a pure cipher
the reason for the nane) lies in the fact that for them &
keys are essentially the same. Whatever key is used for
& particulsr message, the a posteriori probabilities of a
messages are identical* To see this, note that two diffe
keys applied to the same message lead to two cryp-tcgrams
the same residue class, say Cj » The two cryptograms ther
fore could each be deciphered by — keys into each mes.<
9i
in C. and into no other possible messages. All keys be in,
equally likely the a posteriori probabilities of various
messages are thus
pbim) - hp a&ai _mi
E P{M) PM{E) "
where M is in C,, E is in CI and the sum is over all mess-
in C, .. If E and M are not In corresponding residue classe
Pg(Mr - 0/ Similarly it can be shown that the a posterio:
probabilities of the different keys are the same in value
these values ere associated with different keys when a di?
ent key is used. The same set of values of PE(K) have un<
gone a permute t ion among the keys. Thus we haVe the resul
. Theorem 5: In a pure system the a posteriori probability
of various messeges P~(MJ are independent of t
key that is chosen* The a posteriori prob;
bilities of the keys PE(K) are the same in vai
but undergo a permutation with a different ke\
choice.
Roughly we may say that any key choice leads tc
the sf.me cryptanalytic problem in a pure cipher. Since tfc
different keys all result in cryptograms in the same resid
class this means that all cryptograms in the same residue
class nre cryptanalytically equivalent — they lead to the s
a posteriori probabilities of messages and, epart from a
permutr.tion, the same probabilities of keys.
As an example of this, simple substitution wit:
all keys equally likely is e pure cipher- The residue cle
corresponding to a giTen cryptogram E is the set of all
Cryptograms that may be obtained from E by ope'rstions T < T
In this case T . Tk~l is itself' a substitution and henoe an.
substitution oil E gives another member of the same residue
class.. Thus if the cryptogram is
49
' |'|| | I ■
E'ICPPGCf d
then
E1»RDHHGDSN
Eg»ABCCDBEF
etc. ore in the same residue class. It is obvious in this
case, that these cryptograms are essentially equivalent.
AIT that is of importance in a simple substitution with
random key is the pattern of letter repetitions, the actur
letters being dummy variables * , Indeed vie might dispense
with them entirely indicating the pattern of repetitions
in E as follows:* -
This notation describes the residue class but eliminates e
information as to the specific member of the class* Thus
leaves precisely that information which is cryptanalytical
pertinent. This is related to one method of attacking sic
substitution ciphers — the method of pattern words.
In the Caesar type cipher only the first difft
ences mod 26 of the cryptogram are significant. Two crypt
grams with the sane Ae, are in the same residue class. Or.
breaks this cipher by the simple process of writing down t
26 members of the message residue class and picking out th
one which makes sense.
The Vigenere of period d with rpndom key is a'r.
example of a pure cipher. Here the message residue class
consists of all sequences with the same first differences
letters separated by distance d as the cryptogram. For
d m 3 the residue class is defined by
ml " m4 " el ~ e4
m2 m5 " e2 " e5
~ n6 e5 " 66 r
m4 ' "7 " 64 "e7(
|
1
^Suggested by a notation used by Quine in Symbolic Logic*
- 50 -
where E - e^, e0, ... is the cryptogram and m^, m^, ... is any
M in the corresponding residue class.
In the transposition cipher of period d with random
key, the residue class consists of all arrangements of the e.
in which no e, is moved out of its block of length d, and any
two e. at a distance d remain at this distance. This is used
in brisking these ciphers as follows. The cryptogram is written
in successive blocks of length d, one under another as belo-w
(d «= 5):
el
e2
e3
4
e5
e6
e7
e8
e10
ell
e12
•
•
•
•
•
•
*
»
The columns are then cut apart and ^rearranged to make sense.
When the columns are cut apart, the only information remaining
is the residue class of the cryptogram.
Theorem 6: If T is pure then Tj_ T* T « T where '
Ti Tj are eny tv,° tronsform'' 'tions of T. J Conversely if
this is true for any Tj in a system T then T is pure.
The first part of this theorem is obvious from the
definition of a pure system. To prove the second part we note
first that if T, T."1 T * T then T, T.-l T is a transforma-
l j 1 j s
tion of T. It remains to show thpt all keys are equiprob^ble .
We have T - E P T and
s
s *s i j s s *s s
the term in the left hand sum with s • j yields
The only term in Tj on the right is Since all co-
efficients rrc non negative it follows that
x
The same argument holds with i and $ interchanged and
consequently
pj c Pl
and T is pure. Thus the condition th^t T, T.-1 T - T might
be used ~s an - lti.rn- tive definition of a J pure system.
- 51 -
The property of purity in e system is connected vtit.v
idempotence. Thus consider the system S ■ T T' where T is
pure. We have
Ti Tj"1 Ts V1 ' Ti V1 Tr V1 " Ti V1
so th"t the transformations of S are the same ~s those of S,
■and since both S and S are pure we hrve
S - S2
Theorem 7: If T is pure S » T I' is pure and S2 * S.
An endomorphic system T which satisfies the conditi'
Ti Tj * Ts ^but not necessrrily with all key probabilities
equal) can be shown to approach a pure cipher on raising to a
high power, namely the one with the same trensf ormr-tions , but
with all probabilities equalized.. In fact the probabilities
for Tn+1 are derived from those for T^ by a Markoff process,
of a special type due to the. group property* This special
type always approaches the limit of equalized probabilities.
This seme argument applies more generally.' We have
Theorem 8: Let T be any endomorphic cipher. If T11 approaches
any limit at ^11, which will necessarily occur if
all the transformations of Tn lie in a finite set
(no matter how large n) and the transf arffln tions of
T include the identity then this limit will be r
pure cipher.
As m example consider the cipher
R = p T + q S
where T is transposition with random key and S substitution
with random key. We have
S2 = S
T
ST ■ T S
-
and hence any product of T* s and S?s suoh asTST-TTSS
reduces to S T. Thus
Rn - pn T + qn S + (1 - pn 1 qD) S T
- 52 -
Ls n 10 the first two terms approach zero find
Lin Rn » S T
n -*• xi
The concepts of pure ^nd mixed lnngu-.gts nnd. pu
and mixed ciphers have an application in practical cryptana
ysis, if we interpret them somewhat loosely. When a crypt-1
grapher starts work on a cryptogram, his first job is to de
termine the original language. Approximately then he is de
termining the pure component of the general language space
L > px Lx + p2 Lz + ... ♦ pn Ln
where say is English, L£ German, etc. Of course these e
not pure but the different components of them are fairly cl
together in statistical structure.
The second thing a cryptographer d~>es is to de
termine the "type" of cipher that was used — usually this is
about the same as finding the pure component in the general
cipher system
R • Px S + p2 T + p3 Y + ...
where 3 say is simple substitution, T is transposition, etc
A Vigenere V of unknown period is not a pure cipher but the
decomposition
V * Pi Vl + P2V2 + *3 V3 + —
where V, is of period i, is into puro components (if all ke
are equally likely for any period). In solving e Vigenere
the first problem is to determine the period. The same is
true in transposition.
The reason for this initial isolation of pure
«of neerly pure language and cipher is that only then or.n a
simple meaningful stntistical analysis be carried out.
—
16. Involutory Systems
If every trsnsf orrar: tioh in n systen T is its y.
inverse, i.e. If
Ti Ti - 1
for every i, the system will be called involutory. Such
systems are important prrcticrlly since the enciphering r
deciphering operations -re then identical. This l«vds t*
sinplifiod instructions to cryptographic clerks in manual
oper^ti^n, or in mechanical cases the sane machine with t
sane key setting nay be usee" for bath ~perctions.
Examples: In simple substitution we nay limit our trans-
formations to those in which when letter 9 is
the substitute for <p, 9 is the substitute for
.toother example is the Beaufort cipher-
If T is involutory, so is the system whose ope
tions are :^-.;>r :
■ - . * ' . •" ■ .*• 1
SS Ti si
\ - ,*
since ■ ; .
17. Similar rnf Weekly Similar Systems
Two secrecy systems R and S will be s-^id to b<
similar if there exists ' transf orn- tion /. having en. invc
A- J- such th^t
r
R ■ A S
This means thrt enciphering with R is the same ps enciphe
with S ' n.Q then 0 per- ting on the result with the transf or
tion A. If wo write Rw S to mean R is similar to S then
is clear thrt R»S implies S^R, Also R« S pnd S» T impl
R~T and finally R~R. These are sun-prized in mathenati
terminology by spying that similarity is an equivalence
relation. * * '/ *
The cryptographic significance of similarity i.
if R~S then R and S are equivalent from the cryptanaly
point of view. Indeed if a cryptanalyst intercepts a cry
gram in systemNS he can transform it to one in system R b;
merely applying the transformation A to it# /. cryptogram
system R is transformed to one in S by applying vArlf If :
and S ar6 applied to the same language or message space,
there is f one-to-one correspondence between the rc-sultin
cryptograms. Corresponding. cryptograms give the same dis
tribution of r posteriori probabilities for all messages.
If ~ne hrs r art|p3 of broking the system R the:
any system S similar to R en be broken by reducing to R
through application if the -perrti^n A.' This is r device
thct is frequently used in pr^ctic~l cryptrn" lysis .
Examples: As r trivial cx^mjle, simple substitution v.herc
the substitutes ^re n^t letters but ^rbitr^ry
symbols is similar t? simple substitution using
letter substitutes. A second exrmple is the
Cresar rnd the reversed C^es^r type ciphers.
The letter is sometimes broken by first trans-
forming into a Cresar type. The V-igenere,
Beaufort rn? Variant Beaufort are p11 similar,
•when the key is random. The "autokey" cipher
primed with the key K, Kg ... K, is similar to •
Vigenere type with the key .'alternately added an'
subtracted Lod 86» The %tf nsformrtion A. in this
case is th^t of "deciphering" the. autokey with
. a series of d A*s for the priming key.-. -
* '•-•.'■». .■■>:. .v....
Tv,- systems R fn? S are weakly similar if there
exist two transformations A an<* B having inverse A'l end
B-l with
R - A S B
This me^ns ttrt system R is the same ~s applying first B
t^ the language, then S, mc1 finally A. This rcl^tim is
rlso nn equivalence relation.
Finding a method of solution f-^r system R with
lrngunge L is equivalent t^ finding a solution for S with
language B L. ■
We may note that if R is pure an' S is weekly
similar t' R then S is pure. This follows from
R.i Rj-1 Rk - Rt
■ A Si B
Kfl « B--1 Sj1 A"1
\ - A sk B v/
where we assume corresponding transformations in R on" S
t-i h~ve the srme subscripts. Hence
- 55 -
-i
-1
R. R - * R. - A S, S. S. B " R
i °j
.r1 r^ b"1
3j
anc S is therefore pure*
* - t
t •. .
PART II
Theoretical Secrecy
Introduction
We now consider problems connected with the "theorecti-
cal secrecy" of a system. How immune is a system to cryptanaly-
sis when the eryptanalyst has unlimited time and manpower avail-
able for the analysis of cryptograms? Does a cryptogram have a
unique solution (even though it may require an impractical amount
of work to find It) and if not how many reasonable solutions does
it have? How much text in a given system must be intercepted be-
fore the solution becomes unique? Are there systems which never
become- unique in solution no matter how much- enciphered text is
Intercepted? Are there systems for which no Information whatever
is given to the enemy no matter how much text is intercepted?
18 Perfect Secrecy
Let us suppose the possible messages are finite in
number Mi..* Mn and have a priori probabilities P{Mi),...,
P(Mn), and that these are enciphered into the possible crypto-
grams Ei ,..Em by
E - Ti M .
The eryptanalyst intercepts a particular E and can
then calculate the a posteriori probabilities for the various
messages, Pe(M) • IT is natural to define perfect secrecy by
the oondition that for all E, the a_ posteriori probabilities are
equal to the a priori probabilities independently of the .values
of these, In~~tnis case, intercepting the message has given the
eryptanalyst no information** Any action of his whioh depends
on the Information contained in the cryptogram cannot be altered,
for all of his probabilities as to what the cryptogram contains
remain unchanged*- f On the other hand, if the condition Is not
satisfied there will exist situations' in which the enemy has cer-
tain a_ priori probabilities, and certain key snd messages are
chosen where the enemy^ probabilities do .change* This in turn
may effect his actions and thus perfect secrecy -has not been . . ,
— «•.' *» ^ «• «• — «► «• — -* a» _ ■» f •» — a» . a* •»
*A purist might object that the enemy has obtained a bit of infor-
mation in that he knows a messsge was sent. This may be answered
bykJhaving among the messages a "blank" corresponding to "no mes-
sage tfl If no message is originated the blank is enciphered and
sent as a cryptogram,, Then even this modicum of remaining infor-
mation is eliminated,
obtained. Hence the definition given is necessarily required by
our ideas of what perfect secrecy should mean.
A necessary and sufficient condition for perfect sec-
recy can be found as follows.- We have by Bayes' theorem
t> P(M) ^(E)
P-r M - ■
* P(E)
> ■
and this must equal P(M) for perfect secrecy, Hence either
P(M) * 0, a solution that must be excluded since we demand the
equality independent of the values of P(M) , or ; ;
- ' ) ; -,p(e) . ■
for every M and E» Conversely if ^(E) - P(E) then
and we have perfect secrecy* Thus we have the result:
■ .
Theorem- 9; A necessary and sufficient condition for
perfect secrecy is that
-
PM(E) - P(E)
for' all M and E. That is Pjj(E) must be
independent of K,
The probability of all keys that transform M« into a given crypto-
gram E is equal to that of all keys transforming if* into the
same E.
Now there must be as many E's as there are MTs, since
fixing i, Tj gives a one-to-one correspondence between all the
MTs and some of the E»s . For perfect secrecy Pvr(E) « P(E) ^ 0
for any of these E»s and any M. ■ Hence there is at least -one key
transforming any M into any of these E*e, But all the keys from
a fixed M:to different E's must be different, and therefore the'
number of different keys, is at least as great as the number of
M»s* It is' possible to obtain' perfect, secrecy with no more, »s
one shows by the following example* . I,et the be numbered 1 to
n and. the E^ the same > and using n keys let
_ - ^ ■* >:?:**,:■ <■ * *f 'f'*t'%«.. .: . ■ . •' •' rj**?* ' ' -
where s ■ i +>j (Mod nj . • In this^case we see that P~(M) » — » P<E)
and we have perfect secrecy.' An example is shown
with n « 5. •
- 58 - ooaam^mj
These perfect systems in which the number of crypt
grams, the number of messages r and the number of keys are al
equal are characterized by the properties that (1) each M is
connected to each E by exactly one line, (2) all keys are eq
likely. Thus the three matrix representations of the system
"latin squares".
We have then concealed completely an amount of inf
tion at most log n with a size of key log n. This is the fi
example of a general principle which we will often see, that
there is a limit to what can obtain with a given key size— t
amount of uncertainty we can introduce into the solution of
cryptogram cannot be greater than the key size* Here we hav
concealed all the information but the ke*y size is as large a
message space* .
We now consider the case where lM| is infinite; in
suppose the message generated as an unending sequence of let
by a Markoff process* The maximum rate of this source is Rc
It is clear from our results above that no finite key will g
perfect secrecy. We suppose then that the key source genere
key also in the same manner, i.e. as an infinite sequence or
bols with a mean rate RK. Suppose that only a certain lengt
key Ljc is needed" to encipher and decipher a length of mes
Theorem 10: For perfect secrecy (when the a priori proba-
bilities of various messages can be anything) ,
for large L
Ro LM < %
and the rate (RR * e) is asymptotically
sufficient.
This may be provSd by the same method (essentially
the finite case. This case is realized by the Vernam systet
These results have been deduced on the basis of un
or arbitrary a. priori probabilities for the messages* The k
required for perfect secrecy depends then on the total numbe
possible me s sages j 6? on the maximum rate Bo °f the' message
source. * - •'.
" ~* ' - one would suspect that if the message space has fi
known statistics; so that it has a definite mean rate R of
generating information, th<3n the amount of key needed could
reduced in an average sense in just this ratio JL» end this
Ro
indeed true. In fact the message can be passed through a ti
ducer which transforms it into a normal form and reduces the
- 59 -
expected length in just this ratio, and then a Vernem syst-
may be applied to the result. Evidently the amount of key
per letter of message is statistically reduced, by a factor
R
— and in this case tho key source and information source
H0
just matched--an alternative of key conceals an alternativ
information. It is easily seen also, by the methods used :
"Information* paper that this is the best that can be done.
K Theorem 11; 'Perfect secrecy (omitting the condition of
independence of a_ priori probabilities) for
. a source with fixed statistics and a, rate
R of generating Information can be' 'achieved
with a key source which generates at the
rate (R + e) where W and Lv are message
„ • - _ «• **
LK
and key lengths^ which correspond. ;A rate
less than R iM. is insufficient.:
% ' -
Perfect secrecy systems have a place in the prac-
picture — they may be used either where the greatest import
is attached to complete secrecy — e.g. correspondence betwe.
the highest levels of command, or in cases where the numbe:
possible messages is small. Thus, to take an extreme exam;
if only two messages "yes" or "non were anticipated a perft
•system would be in order, with perhaps the transformation -
K
M
A
B
yes
- 0
1
no
1
0
The disadvantage of perfect systems for large co:
pondence systems is,' of course, the equivalent amount of ke
that must be sent. In succeeding sections we consider what
be achieved with smaller key size, in particular with fini-
keys,
19. Equivocation
Let us suppose that's simple substitution' cipher
been used on English text and that we Intercept a certain t
N letters, of the enciphered text. For N fairly large, mo:
than say 50 letters, there is nearly always a unique solut:
the cipher; i.e. a single good English sequence which tram
- 60 - SpjffffifflffiCI&Li
into the intercepted materiel by a simple substitution. W:
smaller N, however, the chance of more than one solution is
greater; with N * 15 there will generally be quite a numbe:
possible fragments of text that would fit, while with N = E
good frecteon (of the order of 1/8) of all reasonable Engl:
sequences of that length are possible, since there is seldc
more than one repeated letter in the 8. With N «* 1 any let
is clearly possible and has the same a posteriori probabili
as Its a priori probability,. For one^letter the system is
feet, ~
This happens generally with solvable ciphers. Be
any material is intercepted we can imagine the a^ priori pre
bill ties attached to the various possible messages, and a Is
to the various keys. As material Ik Intercepted, the crypt
lyst calculates the a posteriori probabilities; and as N ir
the probabilities *>f*""certa in messages • increase * and of most
decrease, until finally only one is left ^ which has a probe
nearly one, while the total probability of all others is ne
zero, - : r.
This calculation can ectually be carried out for
simple systems. Table 1 shows the a .posteriori probabiliti
for a Caesar type cipher applied to English text, with, the
chosen at random from the 26 possibilities. To enable the
of standard letter digram and trigram frequency tables the
has been started at a random point (by opening e book and p
a pencil down at random on the page). The messege selectee
this way begins "creases to • , ," starting inside the wore
creases. If the message were to start with the beginning c
sentence a different set of probabilities must be used, cor
ponding to the frequencies of letters, digram , etc,, at t
beginning of sentences, ./.„.■
The Caesar with random key is a pure cipher and t
particular key chosen does not affect the a posteriori prot
bilitles; To determine these we need mereTy list the possi
decipherments by all keys and calculate their a priori prob
bilitles* The a posteriori probabilities are Ehese divided
their sum; These possible decipherments are found by the
standard process of "running down the alphabet" from the me
and are listed at the left* These form the residue olass f
the message. For one intercepted letter the a posteriori p
bilitles ere equal to the a_ priori probabilltres for letter,
are shown in the' column- headed Nf s 1, For two intercepted
letters the probabilities are those for digram adjusted t
sum to unity and these are shown in the column N * E.
- 6i - aaffflft
Table 1
A Posteriori Probabilities for a Caesar Type Cryptogr
Decipherments
N = 1
N - 2
N - 3
N - 4
CREAS
• 032
.015
•111
.55
DSJBT
, .036
.068
ETGCU
,123
.170
/ •
F U H D V
, .023
,023
G V I E W
. .016
«■
H W J F X
,051
- .015,
•
I X K G Y
,072
t-i
JYLHZ '
.001
K Z M I A
. .005
L A N J B
. .040
. ,072
. .250
.01
MBOKC
, .020
.019
. .022
. *.oi
N C P L D
. ,072
4 ,066
0 D % M E
. .079
V .034
P E R N F
, ,,023
, .085
. #438
a n
. -#43
Q F S 0 G
. „002
RGTPH
. .060
.013
SHUQI
• .066
.064
. .005
T I V R J
.096
.272
.166
U J W S K
. .030
V K X T L
. .009
W L Y U M
. .020
.008
.005
X M Z V N
.002
Y.N A WO
.019
.006
Z 0 B X P
.001
A P C Y Q
.080
. .066
B Q D Z R
.016
Q, (digits)
-1.248
#999
. .602
.340
Trigram frequencies have also been tabulated and .these are
in column N *.3. For four and five letter sequences probe
, ties were obtained by multiplication from trigram t re quenc
since approximately " ,\ '.. Vv^w.-'--
•v- •
p{ijki) --p(tjk) PJk(^)
■ **- ■ -> . --. ■
t
- 62
rriUlTTWiTTTi'iT
Note that at three letters the field has narrowe
to four messages of fairly high probability, the others bei
snail in comparison. At four there are two possibilities
five just one, the correct decipherment.
In principle this could be carried out with any
but unless the key is very small the number °f jg""^
so large that the work involved prohibits the actual caicu
This set of a posteriori probabilities describes
the cryptanelyst's knowledge of the message and key g re due
becomesPmore precise as enciphered material is obtained
description, however; is much too involved and difficult t
obtain for our purposes. What is desired is a simplified
caption of this approach to uniqueness of the possible sc
We will first define a -quantity Q called the "ec
vocation" which measures in an average way ^.^J*8"*;
the solution, or How far it is from unicity. Suppose tha;
celtl in cryptogram E ,of N letters has been intercepted. .
c?yptaSa^st III in principle calculate the a posteriori ,
Mlities by the use of Bayes' theorem..- Thus
P^M) « P(M) PM(E)/P(E)
Similarly the probabilities for various keys, after E has
intercepted are given by
P2(K) - PlK) Pk(E)/?(E)
The equivocation of the message should measure
way how -spread out these probabilities PE(M) are; how far
are from being concentrated at one message. In Xio* with
General principles of measuring such dispersion, as in th
Srhnioe uncertainty, and generating Information, we de
He Equivocation or tU messfge when E has been intercept
... ■ ■■ .......
•v^-v^-. , ■ ^(M) m j. pg(M) log' Pe(M)
M
the summation being over ell P05*1*1^*3 !f ^ven^1*1"1
equivocation in key when E in intercepted Is given *y
q(K) - - T PE(K) log Pe(K)
K
The same general arguments used to justify our me
of information rate may be used here, to justify the equivc
measure. We note that equivocation zero requires that one
sage (or key) have probability one, all others zero. Equi\
is measured in the same units as information, i.e. alterna'
digits, etc., according as the logarithmic base is 2, 10, c
In fact, equivocation is almost identical with information,
difference being one of point of view. In information we £
the notion of how much freedom we have in choosing one eler
from a set with certain probabilities — in equivocation we t
size the uncertainty of our knowledge of what wss chosen wt
probabilities have certain values.
Although any one number can hardly be expected tc
cribe the set PE(M) perfectly for all purposes, I think the
defined here does as well as any single statistic can* Sor.
the theorems which follow indicate the mathematical "naturt
of this particular measure.
.
The values of equivocation for the Caesar type c:
gram considered above have been calculated and are given ir
last row of Table 1. This is the Q, for both key and messaf
the two being equal in this case.
The definitions given above involve 'a particular
cepted E, and ore the equivocations for that intercepted c:
gram. We wish, however, to find a measure of the equivocf
for the system as a whole, which will describe this progre:
toward uniqueness as N increases in an average sort of way.
To do this we form a weighted average of the equivocations
each particular intercepted message E, weighting in accord;
with the probabilities of getting the E in question. This
be called the mean equivocation of the system, or where ttu
is no chance of confusion with the narrower equivocation fc
particular E, we abbreviate to merely the equivocation. T:
mean equivocation of message is
Q(M) - - T P(E) Pe(M) log Pe(M)
/ M,E
v
the summation being over all M and all E. Since
P(E) Pg(M) - P(E, M)
the probability of getting both E and M, we can write this
PM(E)
Q(M) - - T P(M,E) log PE(M) - - 2 P(M,E) log P(M)
P(E)
- 64 - tuiiiii 1 1 milium m
Similarly
Q(K) - - Z P(K,E) log P(K) -f— .
Either of these mean equivocations is a theoretics
measure of the secrecy value of the system. We ssy theoreti
since even when the equivocation is zero, which corresponds
no uncertainty as to the message , it may require. e tremendou.
amount of labor to locate the particular message where the p
bility is one. It might, for example, be necessary to try e
possible K in succession until one was found that trensforme
the intercepted E into reasonable text in the language. Thu
system would be practically very good, but theoretically sol
The equivocation may be said to measure the degree of secrec
when the cryptanalyst has unlimited time and energy.
The equivocation is, of course, a function of N, t
number of letters intercepted. The functions Q(K,N) and Q,(M
will be called the equivocation characteristic* of the syste.
Th3 following data will be helpful in forming a pi
of what small values of equivocation represent.
An equivocation of .1 alternative would result if
9 times in 10 there was no uncertainty as to M, the tenth ti:
two M*s were equally probable, or (2) if every time there we
two possibilities one with probability .983, the other with
probability .017, or (3) if 99 times in 100 there W3S no unc
tainty, the 100th tine 1000 equally likely possibilities.
An equivocation of ,01 would result <1) if every t
there were two possibilities one with probability .999, the
with probability .001, or (2) if 99 times in 100 there is no
certainty, the other time two equally likely possibilities, ;
(3) if 999 times in 1000 there is no uncertainty, the other t:
6 or 7 equally likely possibilities*
* ■ v -.■■-*
- - '* x
20, Properties of ^Equivocation
Equivocation may be shown to have a number of inte:
esting properties* most of which fit Into our intuitive pict
of how such a quantity should behave* We may first show, by
example, the somewhat surprising fact, that after a cryptena.
has intercepted certain special- 'E*a, his equivocation as to !
or message may be greater then before he intercepted anythin,
The Intercepted material has increased his ignorance of what
happenedl Suppose there are only two messages and Mg wit;
a priori probabilities p end qf and that a simple substituti
65
is used according to the following table, the two keys K± and K2
also having the e_ priori probabilities- p and q.
Kl
K2
E2
El
M2
E2
Before the interception, the equivocation of both key and message
is - (p log p ♦ q log q), which is less than one alternative if
p 4 q. If p » q there is little uncertainty as to which message
and key will be chosen, Mi and Now suppose he intercepts
The a posteriori probabilities of both keys and both messages are
easiTy seen to be l/Z. and hence the equivocation for both key
and message is one alternative, greater than before.' On the other
hand, if Eg is intercepted, the more probable event, the equivo-
cation for both key and message decreases, more than enough to
compensate for the other increase, and the mean equivocation of
both key and message decreases. This is a general property of all
secrecy systems.
The mean equivocation of key, Qk(n) iB a non-increas-
ing function of N. The mean equivocation of the
first A letters of the message is a non-increasing
function of the number N which have been intercepted.
If N letters have been intercepted, the equivocation
of the first N letters of message is less than or
equal to that of the key. These may be written
Theorem 12:
Qm(m) < Qm(N)
Qu(N) <
S > N
M > N
The qualification regarding A letters in the second
result of the theorem is so that the equivocation will not be
calculated with respect to the amount of message that has been
intercepted^ If it iB; the message equivocation may lend usually
does) increase for a timej due merely to the fact that more
letters stand for a larger possible range of messages* The
results of the theorem are what we might hope from a good measure
of equivocation, since we would hardly expect to be worse off on
the average after intercepting material than before-. The fact
that they can be proved gives additional justification to our
definition*
- 66 -
The results of this theorem can be proved by a sub-
stitution in the property 6 of section 1» Thus to prove the
first or second we have for any chance events A and B
Q,(B) > QA(B)
If we identify B with the key (knowing the first S letters of
cryptogram) and A with the remaining N - S letters we obtain
the first result. Similarly identifying B with the message
gives the second result. The last result follows from
Q(M) < Q(K) * Qg(M) . \
and the fact that QK(M) * 0 since K uniquely determines M.
Theorem 13: Q,(K) - JM| ~ }E| + jK|
Q(M) « fM | - |E|.+ |Hf
where
- - I P(M,E) log .
M,E
We have
q(k) - - r
E,K
P(K) PK(E)
P(E)
Hence
'Q(K) - - 2 P(K) PK(E) log P{K) - r P(K) Pk(E) log, PKfE)
, + r P(K) PKiE) log P(E)
Summing the first term on E gives - 1 P{K) log P(K) ~
In the second term PviE) is P(M)t the unique M that gives E
with key K. Summing on K then gives - T P(M) log P(M) - |M|.
The third term is 2 P(E) log P(E) - |EU
- 67 -
«iJ!JlfiuJlL 1
The. second equation in the theorem is proved by the
same method.
Q(M) - - Z P(E) Pe(M) log Pe(M)
- - I ?(«) *(» log F(M)
P(EJ
« - Z ?(M) FM(E) log P(M) - Z P(K) Pm(E). log PM(E)
' + Z P(M) PM(E) log P(E) ':
- |M| - |S| - T P(M) PM(E) log, Pm(EJ '
The last term here aay.be interpreted as follows* Group to-
gether 811 the different keys that transform a fixed M into
the same E, giving the total probability to the group, which -v.
will be %(E) . The last term is the average size of this group
space weighted according to the probability P(M) of choosing
among the groups leading out of M. In case no group contains
more than one element (at any rate no group from a M with
P(M) > 0 then |H| * |K| and q(K) - Q,(M) . This is also clear
since there is then a one-to-one correspondence between the
keys and messages for any given E.
From the first equation of the theorem we may conclude
that Q(K) - |K| in case |M| - fEj . This latter occurs in par-
ticular if all L''s ere equally likely and all E»s equally likely
and there are the Same number of each. It is easy to see that
this is the case with a language in which every letter is equally
likely and independent, ond when almost any of the simple ciphers
are used.
If we have a product system S s T R, it is to be ex-
pected that the second enciphering process does not decrease
the equivocation of message and thiq Is actually/true as C8n
be shown by the methods used /above* If T end R commute either
may be considered as being the first and hence in this" case .
the equivocation with S is not less than the' maximum for the,
two systems R and T, Simple examples' show that this does not '
hold necessarily if R and T do" not commute, \\
Theorem 14; The equivocation in message of a product
system S » T R is not less than that when
only R is used. If T R - R T it is not less
than the maximum of those for R and T alone.
68 -
If we hove a product of several systems R S T U, we
con of course extend this, to sey that the equivocation of
R S T U is not less than that of S T U, which is not less than
that for T U, etc
There is no similar theorer.: for the inner product since
for example if T and R are inverse processes their inner product
is the identity and the resulting equivocation zero.
Suppose we have a system T which can be written as a
weighted sum of several systems R, S, U
T - pxR + PgS + ♦ + PmU I Pi - 1
1 .\- - ■
and that systems R, S, U have equivocation characteristics
Qi, Qe %l* • . ' ■ ;' '
Theorem 15: The equivocation Q of a weighted sum of
systems is bounded by the inequalities
2 PiQi < Q < 2 PiQi - I Pi log Pi
These are best limits possible. The Q»s may refer either to
key or to message, .
The upper limit is achieved, for example, in strongly
ideal systems (to be described later) where the decomposition
is into the simple transformations of the system. The lower
limit is achieved if ell the systems R, S, ..t) U go to com-
pletely different cryptogram spaces. This theorem is also proved
by the general inequalities governing equivocation,
QA(B) < Q(B) < Q(A) ♦ QA(B).
We Identify A with the particular system being used and B with
the key or message, •
There Is a similar theorem for weighted sums of
languages, ■ v "■
Theorem 16: Suppose a system can be applied to lenguages
• , ••* ^i# L2». •♦•> Lm Qn<l has equivocation cha,rac-
, teristics Q^.* Q-2» ^m* When °PPlied t0
the weighted sum ? Pi Li, the equivocation Q,
is bounded by
2 Pi Qi £ Q £ 1 Pi^i " 1 Pi log pi
- 69 -
These limits are the best possible end the equivocations i
question can be either for key or message.
The proof here is essentially the 'same as for th
preceding case.
An important consequence of the result
Q(K) « iKf + |Ml - JE|
is the following,'
, . ..«'. *~ •
Theorem 17;* In any closed system, or any system where
-. <. " the total number of possible cryptograms is
. ' ; equal, to the number of possible messages"
• of N letters Q(K) > \K] - < fM0 1 - }M|) •* |K] •
'L v * i " : where M0 » log H, with H the number of pos-
- - , ' :: ■>-.■.'•'.;-. sible messages of N letters." Dm is the total
redundancy for N letters,'
This is true since |M0 | > [Ef, the equality hold
only if all cryptograms are equally likely.1 The theorem s
that in a closed system the key is determined only by the
dundancy of the language - the equivocation can decrease o
es the redundancy comes into action and at no greater rate
Suppose we have c pure system and let the differ
residue clesses of nassoges be Ci., C%r Cr, The co
ponding set of residue classes of cryptograms is C^,..
The probability of each E in is the sane: ;
' Where is the number' of different messages in Thus ;
: , - «-z p(Ci) log' - '
P(E) « 2i££i E e C,
70 -
Substituting in our equation for Q, we obtain:
Theorem 18: For a pure cipher
Q - \K\ + (Hj ♦ I P(Ci) log
This result can be used to compute Q, in many cases of inte
From the analytic point of view pure ciphers hcv
simple structure. If a cryptogram is intercepted its resi
class gives the complete information obtained by the crypt
Within the residue class the system is perfect - each mess
in the class has an a posteriori probability equal to its
a priori probability? For large N. beyond the unicity poi
There will usually only be one M in the class of reasonabl
probability., and the -problem is to determine this M.
The theorem oh equivocation of pure' ciphers can :
altered to show this. We have
iptCi) log ZllLL « z p(ci) log p(ci) -i p(Ci) log ^-
<?i V1
+ Z ViCi) log k
- Z PtCiJ log P(Ci) + QM(K) - |K|
Hence
end
P(C< )
Q (K) - |K| + |M| + Z P{C, ) log i-
" |*| ♦ QM(K) + I P(Ci) log P(Ci)
Q <M) '■' - |M| - [-Z P(Ct) log HCil 1
The equivocation of message is the equivocation of message
the cryptogram was intercepted less the information imparte
specification of its residue class, ; . * " : ■
SI. Key Appearance Characteristic
Suppose the cryptanalyst has N letters of message
and N letters of the equivalent cryptogram. Then he can ca3
cul.ate the a posteriori probabilities of the various keys or
the basis of this information, and if N is small there will
remain a certain equivocation of key* For example in simple
substitution, knowing 20 letters of message and cryptogram
does not disclose the entire key, since only about 12 letter
of the 26 will be represented, • Thus there is a residual
equivocation of log (26-12);, if exactly 12 letters appear.
We define the mean residual key equivocation as
*••
. , / : . •• „•• ; ,r;-:"
when P(E,M) is the a priori probability of having message M
and cryptogram E, and Pg^fK) is the conditional probability
of K with S and M given*
This may be written by obvious arguments (assuming
all keys equally likely)
%(K)- % P(M,K) log X (M,K)
where X (M,K) is the number of different keys from M in para
with K, that is which go to the same E as K.
For simple substitution let P* be the probability
that a received cryptogram of N letters has X different lett
appearing in it. Then
%(K) * £ Px log (26 - x)j
Approximately
log lbgV^26A)
, r
The bracketed terms vary slowly wifcfc atfd it P&) is fairly
well concentrated, we may take the bracket' out" replacing X
by its mean value Xjv This gives,- after recombination
- 72
QM(K) » log (26 -
This residual key equivocation is shown for simple substi-
tution on English in Fig; 12, It measures how much of the
key has not been used in enciphering N letters of text on
the average,
Theorem 19: QjX) - Q(M) ♦ ft^K)
That is, the total key equivocation (when we don't know the
message) is the sum of the message equivocation and the re-
sidual key equivocation; lie;; the equivocation there would
be in the key if we did know the message; This follows from •
the fact that the key uniquely determines the message
properties 4 and 5 in Section X» ■ * .
22. Equivocation for Simple Substitution on an Independent
., tetter Language . • ■
We will now calculate the mean equivocation in key
or message when simple substitution is applied to a two
letter language, probabilities p and q for 0 and 1, with
successive letters independent; We have
% " % " -2PE PJSlK) log PSlK)
The probability that E contains exactly s O's in a particular
permutation is
1 , s nN-s . s N-s,
g- (P q • ♦ 0. P )
and the a posteriori probabilities of the identity and in-
king substitutions are respectively
ver ting
pa q»"» p1^8 q9
hM m 177^ ♦ ,8 p^8) V? * EFT* ♦ >*;
■
There are („) terms for each 8 and hence
73
This may be written
Q(N) = -Z pS q^3 [s log p + (N-s) log qj
, / s N— s s N-s i
- log (pa q p^a)
- -N [p log p * q log q] *■ Z (*) pS q1^8 log (pS qlN"s q£
« MR + iz <N) (pS qN~S * qS p1*"3) log (pS qN-s * qS p1^
For p = 1/3, q = 2/3, and for p * 1/8, q - 7/8, Q, has beer
culated and is shown in Fig. 13,
Now assume the language contains r different
letters chosen independently and with probabilities p, ,
p£****» pr* By approximately the same argument we have
1 2 T> "l
Q(N) - -Z {sx...8T) px p2 ..*pr r log -r±
Sl !
3P. S* _ Pi "»Pr
Sl f
Zp •••PT1
s, ... sr a r\
± T p
where Z s. » N and Z is over all permutations of 1, 8, ...
for a, tw v
Hence, by obvious • transformations
Q(N) m * £ Z Ur5UjJ 2 Pa^.t.P^32, log Z PaSl....
31*" *3r
P ' P
where R - -£ p^^ log p, , . In particular,
QIO) - ± ri log r| - log r: - JkI
3(1) = R ♦ pj- r log <r-l):
*» R + log (r-l')l
This checks the evident answer for 3(1) - the f:
symbol has equivocation R and the parts of the key not us*
add log (r-lJI
23. The Equivocation Characteristic for a "Random" Closec
Cipher > [
-
In the preceding section we have calculated the
equivocation characteristic for a simple substitution appi
to an independent letter language- This is about the simj
type of cipher and the simplest language structure possibl
yet already the formulas are so involved as to be nearly
useless. What are we to do with cases of practical intere
^ . say the involved transformations of a fractional transpose
tion system applied to English with its extremely complex
statistical structure? This complexity- itself suggests tfc
method of approach* Sufficiently complicated problems can
frequently be solved statistically, \ In order to do this y
define the notion of a "random" cipher.. ^
■
We suppose that the possible messages of length
can be divided into two groups, one group of high and fair
uniform probability, while the total probability in the
second group is small. This is usually possible in inform
tion theory if the messages have any reasonable length. I
the total number of messages be
H » 2 0
where R is the maximum rate and N the number of letters-,
high probability group will contain about
RN
3 = 2
where R is the statistical rate.
The deciphering operation defin&s a function M~ i
which can be thought of as a series of lines, k for each E
going back to various M' s. By a random cipher we will mear
one in which all keys are equally likely and the k lines
from any E go back to random M»s.. The equivocation' in key
is given by - - ' 1 "
Q(K) - 2 P(E) PE(K) log PE(K)
The probability of exactly m lines going back
to the high probability group is
- 75 - ^nil HUB P
(k) (s)m n s)k'm
(m) (IT) 11 " I)
If a cryptogram with m lines going to high probability mes-
sages is intercepted, the equivocation is log m. The prob:
ity of intercepting such a cryptogram is easily seen to be
mH
Sic '
Hence the mean equivocation is
■ * ■ & A ui ill* (1-§,k"m ■ l0s »'
We wish to find an approximation^© this for large k. If t
expected value of m, namely m * § k is »1, the variation c
log m over the range where the binomial distribution assume
large values will be small and we oar* replace log nf by log
This then comes out of the summation leaving the expected e
Hence in this condition
Q - log | k
- log S - log H + log k
- Ik! - ImJ + 1m I
- IkI - N D.
If m is small compared to the large k, the binomial distri-
bution can be approximated by a Poisson distribution.*
(k) m k-m e"X Xm \ m S *
lm) ^ H ml a
Hence
Q - £ e S £r m log m
•* 2
■
-X co * m.
- e £ ~r lo€ (»♦!)'
*Fry, Probability and Its Engineering Uses, p. 214,
- 76 -
When we write (m ♦ 1) for m. This may.be used in the regi<
where X is near unity. For X « 1 the only important term
the series is m - 1; omitting the others
-X
<} « e \ log S
» X log 2
- 2lKl Z'm log 2
Thus <i IK) starts off at IkI , and decreases line
with slope -D out to the neighborhood of N»lKl/D. After a
short transition region, Q, follows an exponential witn ha
life" distance l/D if D is in alternatives per letter. If
is in digits per letter l/D is the distance for a decrease
by a factor of 10. The benavior is shown in Fig, 14 with
the approximating, curves.
By a similar argument given in the appendix, the
equivocation of message can be calculated. It is
Q(M) - lid 1 * BQN for B0N« Q(K)*1kI-DN
CUM) - Q,(K) BQN» <4(K)
Q,(M) - %{K\ - 9 (N) B.(N) " Q,(K)
where <p(N) is the function of Fig. 14, with N scale reduce
by a factor of D . Q(M) rises linearly with slope B0 unt
Ro
this line interests the q(K) line. After a rounded transl
it follows Q(K) down.
Most ciphers have an equivocation characteristic
of this general type, approaching zero rather sharply. We
wiU call the number of letters required for near unicity
solution the unicity distance,
24,. Application to Standard Ciphers.
The characteristic derived for the random cipher
may be expected to apply approximately in many cases, pro-
viaine some precautions are taken and certain corrections
are mfde. ThTmain points to be observed are the f ollowin
1. We assumed in deriving the random characteristic
that the possible decipherments of a cryptogram
are a random selection from the possible message
This is not true in- actual oases, but becomes mc
nearly true as the complexity of the operations
used in the enciphering process and the complex!
of the language structure increase. The more cc
' plicated the type pf cipher, the more it should
follow the random characteristic. In the case c
- 77 -
a transposition cipher it is clear that letter
frequencies are preserved. This means that the
possible decipherments are chosen from a more
limited group - not the entire message space -
and the formula should be changed. In place of
R0 one uses Ri the rate for independent letters
but with the regular frequencies. This changes
the redundancy from
D - rq - r * .707 digits/letter
Df " Rjl - R * •538 digits/letter
and the equivocation reduoes more slowly. In
some other cases a definite tendency toward re-
turning the decipherments to high probability
messages can be seen. If there is no clear
tendency of this sort, and the system is fairly
complicated, and the language a- natural one
. (with its very complex statistical structure) -
then it Is reasonable to make the random cipher
assumption.
In many cases the key does not all appear as
soon as It might. For example in simple sub-
stitution one must wait for a long time to find
all letters of the alphabet represented in the
message and thus deduce the complete key. The
message becomes unique long before this point.
Obviously our random assumption falls down in
such a case, since all the different keys which
differ only in the letters not yet appearing
lead back to the same message, and are not ran-
domly distributed. This error is easily cor-
rected by the use of the key appearanoe character
Istio. One uses at a particular N, the amount
of key that may be expected at that point in the
formula for ,
There are certain "end effects*1 due to the defini
starting of the message which produce a discrepar
from the random characteristics. If we take a
random starting point in English text the first
letter (when .we do not observe the preceding
lsttars) hasa possibility of being any letter w:
to
- 78 -
the ordinary letter probabilities. The next
letter is more completely specified since we
then have digram frequencies. This decrease
in choice value continues for some time. The
effect of this on the curve is that the straigh
line part is displaced, and approached by a
curve depending on how much the statistical
structure of the language is spread out over
adjacent letters. As a first approximation
the curve can be corrected by shifting the line
• over to the half redundancy point - i.e., the
number of letters where the language redundancy
is half its final value*
If account is taken of these three effects, rea
sonable estimates of the equivocation characteristic and
unicity point can be made. The calculation can be done
graphically as indicated in Figs. 15 and 16. One draws t.
key appearance characteristic TKl - ^A^-) *&• total r
dundanoy curve ImJ -ImI {which fa usually sufficiently
well represented by the line' NR) ♦ The difference between
these out to the neighborhood of their intersection is
For the simple substitution the characteristic is shown
in Fig. 17. In so far as experimental checks could be ca.
ried out they fit this curve very well. For example, the
unicity point, at about 27 letters, oan be shown experi-
mentally to lie between the limits 22 and 30. With 30 le
one nearly always has a unique solution to a cryptogram o:
this type and with 22 it is usually easy to find a number
them.
With transposition of period d, the unicity poi.
occurs at about 1.5 d log d/c. This also checks fairly w
experimentally* Note that in this case Q, is defined on.
for integral multiples of d. '
With the Vigenere the unicity point will occur t
about 2d + 2 letters, and this too is about right. The
Vigenere characteristic with the same key size as simple i
stitution will be approximately as shown in Fig. 3.8, The
Vigenere, £layf air and Fractibnal cases are more likely tc
follow the theoretical formulas for random ciphers than
simple substitution and transposition,. The reason for th:
is that they are more complex and give better .mixing char-
acteristics to the messages on which they operate*
■-- ■ ' i '
The mixed alphabet Vigenere (each of d alphabet
mixed independently and used sequentially) has a key size.
'4i- ..
1 .
2
3
4
5
1.25
1.00
.60
.34
0
1.25
.98
.54
,15,
.03
- 79 -
IkI - d log 26V- 26.3 d
and its unicity point should be at about 53 d ♦ 2 letters
These conclusions can also be put to a rough ex
perimental test with the Caesar type cipher. In the part
cular cryptogram analyzed in Table I, section 19, the fun
tion QlN) has been calculated and is given below, togethe
•with the values for a random cipher.
N . 0 ♦
Q {observed) 1.41
Q (calculated) 1.41
The agreement is seen to be quite good, especia
when we remember that the observed 9, should actually be t
average of many different cryptograms, and that D for the
larger values of ,M is only roughly estimated. *
It appears then that the random cipher analysis
can be used to estimate equivocation characteristics and
the unicity distance for the ordinary types of ciphers.
25. Solving Systems Using Only N-Gram Structure. ,
The preceding analysis can also be applied to c
where the cryptanalyst is assumed to know or use only a
limited knowledge of the structure of the language. If n
data about the language other than the digram frequencies
is used in solving cryptograms the equivocation curves ma:
be computed, using for the redundancy curve that obtained
from D„ alone. This curve lies below the curve for all r<
dundancy and the unicity point will therefore be moved to
a larger N. Fig, 19 shows the Q curves for simple substi-
tution on normal English when the cryptanalyst uses only
digram structures.-
26 * . Validity of a Cryptogram Solution.
■ * •
The equivocation formulas are relevant to quest:
which sometimes arise in cryptographio work regarding the
validity of an alleged solution to a cryptogram.. In the
history of cryptography one finds many cryptograms, or
possible cryptograms/ where clever analysts have found a
^solution*!* It involved,* however, sucty a complex process
the material was 'so scanty, that the question arose as to
- 80
whether the cryptanalyst had "read a solution" into the
cryptogram. See for example the Bacon-Shakespeare ciphers
and the "Roger Bacon" manuscript.*
In general we may say that if a proposed system
and key solves a system for a length of material considers
greater than the unicity distance the solution is trust-
worthy. If the material is of the same order or shorter
; _ than the unicity distance the solution is highly suspicioi
Thifleffeot of redundancy in gradually producing
unique solution to a cipher can be thought of in another \
which is helpful. The redundancy is essentially a series
conditions on the letters of the message, which insure tte
it be statistically reasonable. These consistency conditi
produce corresponding consistency conditions in the crypto
gram. The key gives a certain amount of freedom to the
cryptogram, but as more and more letters are intercepted,
the consistency conditions use up the freedom allowed by t
key. Eventually there is only one message and key which
satisfy all the conditions and we have a unique solution.
In the random cipher the consistency conditions are in a
sense "orthogonal" to the "grain of the key", and have the
full effect in eliminating messages and keys as rapidly at
possible. This is the usual case. However, by proper de-
sign it is possible to "line up" the redundancy of the
language with the "grain of the key" in such a way that tt.
consistency conditions are automatically satisfied and Q,
does not approach zero. These "ideal" systems are of such
a nature that the transformations T. all induce the same
probabilities in the E space. Ideal characteristics are
shown in Fig. 20.
27. Ideal Secrecy Systems.
We have seen that *perf ect secrecy requires an
infinite amount of key* With a finite key size, the equiv
cation of key and message generally approach zero, but not
necessarily so* In fact It is possible for Q(K) to remain
constant at its Initial, value IX). Then, ho matter how
much material . is intercepted, there is not a unique soluti
but many of comparable, probability. We will define an
"ideal" system as one in which (UK) and Q(M) do not approa
zero as-* oo, A "strongly ideal" system is one in which
Q(K) .remains constant at IKU
*See Fletcher Pratt, "Secret and Urgent"
m 81 - CO]
r ."V 5,-
I
.1 1 *
V
An example is a simple substitution on an artifi
language in which all letter probabilities are the same and
each letter independently chosen. It is clear that Q(K) »
and Q(M) rises linearly along a line of slope Rq until it
strikes the line Q(K), after which it remains constant at
this value.
With natural languages it is in general possible
to approximate the ideal characteristic - the unicity point
can be made to occur for as large N as is desired. The
complexity of the system needed usually goes up rapidly as
we attempt to do this, however*. It is not always possible
to actually attain the ideal characteristic with any. system
of finite complexity*.
To approximate the ideal equivocation, one may
first operate on the message with a transducer which reduce:
to the normal form « i.e., with all redundancies removed.
After this almost any simple ciphering system - substitutio:
transposition, Vigenere etc*, id satisfactory* The more
elaborate the transducer and the nearer the output is to
normal form, the more closely will the secrecy system ap-
proximate the ideal characteristic. Theorem 20: A necessa:
and sufficient condition that T be strongly ideal is that
for any two keys TT -1T - is a moasure preserving trans-
1 J
formation of fi^ into itself* '
This is true since the a posteriori probability
of each key is equal to its a priori probability if and onl;
if this condition is satisfied,
28* Examples of Ideal Socrecy Systems.
Suppose our language consists of n sequence of
letters all chosen independently and with oqual probability
Then the redundancy is zero, |M:ol ■ |M"j , and from Theorem 11
Q(K) - |K|. We obtain the result
Theorem 21? If all letters aro equally likely and independc
any closed oipher is strongly ideal*
The equivocation of message will rise along the
key appearance characteristic |K| - which will usuall:
approach |k|, although in some casos it does' not*. In the
cases of N-gram substitution,, transposition', Vigenere and
variations, fractional, otc, wo havo strongly ideal system;
for this simple language with Q(M) — |K| as oo..
- 82 -
If the letters are independent but are not all
equally probable, the transposition cipher characteristics
remain essentially the same. The asymptotic equivocations
of both key and message are clearly IKl. In the substitution
cipher they will be less. If all the letter probabilities are
different, then the asymptotic equivocations of both key and
message are zero. The letters can all eventually be de-
termined by frequency count (apart from certain exceptional
sequences of zero measure)* Suppose now that there are ?
letters with probabilities, ' , .
... . ,
PX - P2 < P3 < P4 - P5 - P6 < P9
In this case we cannot separate p, from pg or p4 p= and pfi
from each other, but the different unequal probability groups
can be eventually separated.
If all substitutions are a priori equally likely,
there will be an asymptotic uncertainty among
■ ■•
2i x 3I
equally likely (a posteriori) keys. Hence, the symptotic Q,
be
■ log 21 3:
In general it is clear that the asymptotic equivocation with
a substitution where the different substitutions are equally
likely is
$m (M) ■ (K) - log H
vhere H Is the order of the group of substitutions on the
letter probabilities p^ ... pfl which leave this set invariant.
More generally we can consider an arbitrary pure
sy stem T and a pure language L, . Suppose that T operates >
only "locally" on the letters of U in the sense that the nth
letter of cryptogram depends only on n and a certain finite
number of the letters of M in the neighborhood of the nth
one: ■ ■ - ' itU- -"*»-"
ea - f lK.njm^ m^,. . t.m^p)'.
i
Then we can show that there is a certain subgroup of the t
formations T^-1T which are probability preserving in the
language L. In the limiting cases these would consist of
the identity or of the whole group ™ -1™
Ti V
Theorem B2: Under these conditions the asymptotic equivoc
of key is the logarithm of the order of this subgroup of
. measure preserving transformations.
An ideal secTecy system suffers from a number 01
disadvantages.
- i '■ '.. " '*. . ** \ ..
*•• 1* The system must be' closely matched to the langue
This requires an extensive study of the structur
of the language by the designer. Also a change
statistical structure or a selection from the se
of possible messages as in the case of probable
words (words expected in this particular cryptog
renders the system vulnerable to analysis.
2. The structure of natural languages is extremely
complicated, and this reflects in a complexity c
the transformations required to reduce them to
the normal form. Tbus any machine to perform th
operation must necessarily be quite involved, at
least in the direction of information storage,
since a "dictionary" of magnitude greater than
• that of an ordinary dictionary is to be expected
3. In general, reduction of a natural language to a
normal "form introduces a bad propagation of erro.
characteristic. Error in transmission of a sing
letter produces a region of changes near it of
size comparable to the length of statistical
effects in the original language,.
£9* Multiple Substitute Ideal Systems.
. * There is another way of obtaining ideal or nearl;
,, ideal characteristics using multi-valued secrecy systems.
Suppose our language contains only three letters with -
probabilities 1/8, 3/8 and 4/8, and that successive letter:
84 -
■
in a message are chosen independently. Let there be 1 sub-
stitute for the first letter, 3 for the second and 4 for
the third, and choose at random among the possible substi-
tutes for a letter. It is clear that this system is ideal,
If the different probabilities are incommeasurabl'e, we canr
exactly achieve the ideal behavior, but can approximate it,
by using enough substitutes, as closely as desired*
If the language is more complex, with transition
probabilities, this general method can still be used, but i
becomes more involved* Suppose the choice of a letter de-
pends only on the two preceding letters, not on any more
remote part of the message. The transition probabilities
p, (k) completely desoribe the statistical structure of the
language. We supply substitutes for k When it follows i, J
proportion to p^ 1*1* Of all our m substitutes mp^tk)
represent k after the pair irJ, As before one chooses from
the possible substitutes for a letter at random. The crypt
gram will then be a random sequenoe of the m substitute
letters
As an example, suppose the p^j) are the only
statistics of the language and the values are given by
iNJ 12 3
1
2
.1 .3 ,6
1 2 .5 ,3
,9 .1 0
With 10 substitutes 0, 1, 2, ,,,,9 we construct a substitu
table assigning substitutes (chosen randomly) in proportion
to the frequencies* The following is a typical key.
i
1
I
3
L
2
7 0,5#6 1,2,3,4,8,9
3,9 0,4,8
j .\ • » • *
0,1,2,3,5,6,7,8,9 4
If a 3 follows a E in the message we substitute one of 0,
for it, the choice being random. A second table must be s<
plied for the first letter of the message, corresponding t
unconditional probabilities of the three letters, •
Although of theoretical interest it is doubtful
whether such systems would be of much use practically beca-
. of their complexity and message expansion in ordinary case
However j, the first approximation to such systems, matching
letter frequencies, has b$en used in ciphers and is standa;
practice in codes (where one matches word frequencies).
30 . Equivocation Rate."
■ ■ .< We now return briefly to cases where the key is
not finite, but is supplied constantly, as in the Vernam s-
and the running key cipher In such cases we may define
equivocation "rates'*. One ©onsldere the equivocation Q(N)
of the message when N letters have been intercepted, The
equivocation rate for the message Is defined as the limit
(assuming it exists):
Lim" Q(N)
N-oo ~ Q •
The rate for equivocation of key would be defined similarl;
using the equivocation in the part of the key that has beei
used only, but of course these two are the same. There art
results for these parameters analagous to those obtained
with finite key cases. Let R» be the mean rate of using
key,
■
Theorem 23:
... * '■•
Q* < R»
In case the equality holds we have the analogue of ideal
systems where the complete information of the key goes intc
equivocation. If R* > IB the rate of the-message source,
we can obtain perfect secreoy - In fact we may define per-
fect secrecy as the case in which Q* * H« ,
In the random pase we have the analogous result
V - R» - D, •
31, Further Remarks on^ Equivocation and^ Redundancy.
We have taken the redundancy of "normal English"
to be about ,7 digits per letter of 50^ of RQ. This is on
the assumption that word divisions were omitted. It is at
approximate figure based on statistical structure of the
order of lengths of perhaps 8 letters, and assumes the te?.
to be of an ordinary type, such as newspaper writing,
literary work, etc. Various methods of calculating re<-
dundancy have been devised and will be described in the
memorandum on information mentioned in the intro-
duction. We may note here two methods of roughly estimati
this number which are of cryptographic interest.
A running key cipher is a Vernam type system whe
in place of a random sequence of letters the key is a
meaningful text. Now it is known that running key ciphers
can usually be solved uniquely. .This shows that English
can be reduced by a factor of two to one and implies a
redundancy of at least oOjfa. This figure cannot , be reduced
very much, however, for a number of reasons, unless long
range "meaning" structure of English .is considered* , .
The running key cipher can be easily improved to
lead to ciphering systems which could not be solved withou
the key.. If one uses in place of one English text, about
4 different texts as key, adding them all to the message,
a sufficient amount of key has been introduced to produce
a high positive equivocation rate. Another method would
be to use say every 10th letter of the text as key. The
intermediate letters are omitted and cannot be used at any
other point of the message, This has the same effect, sine
the mean rate for these spaced letters must be over .8 Ho.
These methods might be useful for spies or diplor
. who could use books or magazines for the key source.
A second way of showing the high redundancy of
English is to delete all vowels from a passage. In. general
it is possible to fill them in again uniquely and .recover
the original, without knowing it in advance. ■ As the vowels
constitute about 40j£ of the text this jmta a limit on the
redundancy. ' Aotually there is considerable redundancy left
the various letter and digram frequencies being far tram
uniform, c '■• . ■ v v,f - ~--:xm-.
■ - - . \ ■ ■•. -v • • "• •
- - This suggests a simple,, way of greatly improving
almost any simple ciphering: system * - Jirst delete all vowel
or as much of the message ss possible without running the
risk of multiple solutions, -and than encipher the residue.
Since this reduces the redundancy by a factor of perhaps
3 or 4 to 1, the unicity~ point will be moved out by this
■
- 87 - CONK
factor. This is one way of approaching ideal systems -
using the decipherer's knowledge of English as part of the
deciphering system, **** w WA 6Iie
Two extremes of redundancy in English prose are
represented by Basic English and Joyce's "Einnegans Wake",
The basic English vocabulary consists of only 850 words
and a rough estimate puts the redundancy at about 70*.
A cipher applied to this sort of text would rapidly approa
unicity. Joyce, on the other hand, would be relatively ea
ifJSfi*??^??* 'fl?aI1 red^ancy is disclosed by the dif-
ficulty in filling incorrectly even a single missing lett,
pom "Jinnegan8: Wake" f What the numerical value is, would
be difficult to determine > it varies widely throughout the
COOK,
■ - : * . '"'<-./*
The mathematical extremes of redundancy, 0 and 1C
can be constructed in artificial languages. .In the first
we have e.g.. a single possible message. 0 iden-
tically and QIK) ih, the random cipher case declines as
rapidly as possible i.e.., as rapidly as ohe sends informa-
tion on the system,, v In .the other extreme all letter sequer
are equally likely, and any closed ciphering system is idee
We may refer here to a memorandum by Nyquist
(Enciphering-Effect of Redundancy in "Language, May 30, 1944
in which some questions of the type we are considering here
are discussed. i*—
32. Distribution of Equivocation.
A more complete description of a secrecy system
applied to a language than is afforded by the equivocation
characteristics can be found by giving the distribution
of equivocation. For N intercepted letters we consider
the fraction of cryptograms for which Q (for these particu-
lar E's, not the mean OJ lies between certain limits. This
gives a density distribution function •
. P(Q,Nh d^
f01, ^^Probability that, for N letters Q lies between the
limits Q and Q + dft, . The mean equivocation we have previous
studied is the mean -of ^this distribution. .;
Q.dCi.
The function P(Q,N), can- be thought of as plottedalong a
third dimension, normal .to the paper, on the Q^N plane. If
the language is pure, with a small influence « range (com-
pared to K) and the cipher is pure the function P(Q,N) will
88 - *P0!ff'lU.iJfIAL
usually be a ridge in this plane whose highest point follows
approximately the mean at least until near the unicLty
point. • In this case, or when the conditions are nearly
verified, the mean Q curve gives a reasonably complete pictv
of the system, •
On the other hand, if the language is not pure,
but made up of a set of pure components..
L • Z %\ ,
■ ' ' ■ '•
having different equivocation curves with the system, say
Qi. Qj>, .... Q then the total Q distribution will usually be
made up of a series of Ridges* 1 There will be one for each 1
weighted in accordance with its p*y The mean, equivocation
characteristic will be a line somSwhere in the midst of thes
ridges and may not give a- very complete picture of the sit-
uation. This is shown in Pig* '21 # ,« , ' ~
A similar effect occurs if the, system is not pure
but made up of several systems with different ft curves.
There is then a series of ridges in the PU,N) plot, and
the mean Q, strikes an average which ,may lie between ridges
and be a very improbable value of Q, for a particular crypto-
gram. These effects are illustrated in Fig. -22.
The effect of mixing pure languages which are
near to one another in statistical structure is to increase
the width of the ridge. Near the unicity point this tends
to raise the mean equivocation, since equivocation cannot
become negative and the spreading is chiefly in the positive
direction. We expect therefore, that in this region the
calculations based on the random cipher should be somewhat
low.
I
- 89 -
PART III
, Practical Secrecy
33. The v.Tork Characteristic
After the unicity point has been passed there wil
usually be a unique solution to the cryptogram. The proble
of isolating this single solution of high probability is th-
problem of cryptanalysis .. In the region before the unicity
point we mav say that the problem of cryptanalysis is that
isolating all the possible solutions of high probability (c
pared to the remainder) and determining their various probe
ities. . . i ... / ** -.'* " - . ...
>.; :;'7V-- - .
Although it is always possible in. principle, to de-
f. • mine these solutions <ty trial of each ^possible key for e'xa;
different enciphering systems show a wide variation in the s
of work required. The average amount of work to determine
key for a cryptogram of N letters- T"(N) measured say in man .
may be called the work characteristic of the system. This
averag. is taken over all messages and all keys with their ;
propriate probabilities.
; , For a simple substitution on Snglish the work and
equivocation characteristics would be somewhat as shown in
Fig.. 23.- The dotted portion of the curve is where there ar
numerous possible solutions and these must all be determine
In the solid portion .after the unicity point only one solut.
exists in general, but if only the minimum necessary data e
given a gr^at deal of work must be done to isolate it. As
more material is used thj work rapidly decreases toward som
asymptotic value - where the additional data no longer redu-,
the labor. ,
I , This is the work characteristic for the key. It :
* \ '. clear that after the unicity point this function can never :
• *■ 1 creese. There is also a work characteristic: fdr the messag
the average emount of work to determine th;e;raessago (or all
' reasonable messages) . . This will i, ih ordinary cases , be bel
or et any rate not far above the work characteristic for th
key, out to fairly large W. since generally If 'the key is d
termined it is easy to find IS by the deciphering transformer
For very largo N, howevdr, this function will incroa-se due
merely to the lebor of deciphering the large amount of inte:
cepted material. . -
- 90
Essentially the behavior s^ ^>*^Mo,
exnected with any type of seer -c y quired, however
c.pproaches zero. The seal ^ofv men nou *^ g> _ven ^
will differ greatly with diffor*nt ^yp Qr cocipound
th. Q curves are about *gw. ^ k5y si2i3 would have a muc
Vigenere, for example, with th. Sect/ristic. * good practic:
better (U./nuoh ^f^fttf"(H)curve remains sufficie:
secrecy system is one l4t.rs one expects to transmit
ly high out to the number of ™ uctSaiiy carrying out
with the key, -to g^tv^t tStuch an extent that the inform:
the solution,' or to delay it to su i
tion is obsolete. * • • .
-V ^•^wiUxan,ider>n the following ^^Sb/^C?L-
. keeping the* Unction fW^o, - ^^^type of "problem as
» cllv zero, * This is essential/ - hfttle of wits.*. ' In design-
■ is always'the .case when we ^^g^ amount of work
ing a goodr cipher we must m ™ unougn merely to
thf ene**rnust do ^ t^;k it.^ ^ **f twullysis work -
be sure none 01 tho St. nd.ra iU break the system
we must show thct no method ^tev.r f Q$ m ny systems
< easily. This U 5l!tb3i SS known methods of solutio:
they were designed to resist ai w fl;3tnod which applied to
but had r structure leading to n;*> nr™ hfcVd b3on many
disclosed werknjssos of th„ir own.
- -v flasiKii is essentially on
in a field . • .
v.- e„r« that a system which is not
vife3* 1 -„-,- -"*""*." »tTh »nrv of Games"., The s:
te^'^^^ Neumann ^^^^^Sr cnl crjptanalyst can be th
,.tlori between the ciPner-/t?nfi atructure; a zero-sum two p
• - ' : ^ 'lt ss^gome" of » very feLT 'Lt ^ "novas*. The <
^ game wi%. comp^^^ Information,^ ana jv. cryptan:
I %. Cign#chooses a system for ^^^^-^^od-of analysis
is informed of. this choic. and cno ~ rjquired to bre
. - The "value" of the P^.J ^ "nathod cll0Sjn...'
r. cryptogram in the system cy
•(1) *fe can study the possible methods of solution available
to tha cryptanalyst and attempt to describe them in suffici^-n'
gen:.rc.l t^rns to cover iny methods h^ might use. fc'j th^n con-
struct our system to resist this "general" method of solution.
(2) \U may construct our ciphers in such a way that breaking i
is equivalent to (or requires at some point in the process) tl
solution of some problem known to be Laborious. Thus, if we
could show thf.t solving t system requires at least as much wor
as solving a system of simultaneous equations in a largo numb^
of unknown, of a complex type, then we will have e lower bounc
of sorts for the work characteristic. ' .
"i-- r ■ •"' . •„•> '
The next three sections ore aimed at these general
problems. It is difficult to define the pertinent ideas in-
volved with sufficient precision to obtain results in the forrr.
of mathematical theorems/ but it is believed that the conclusi
in the form of general principles, are correct.
34 . - Generalities on the Solution of Cryptograms .
After the unicity distance has been exceeded in intc
cepted materiel, any system can be solved in principle by mor_-
trying each possible key until the unique solution is obtained
i.e., a deciphered message which "makes sense" in ~l*-r. A simpl
calculation shows that this method of solution (which we may c
complete trial nnd error) is totally impractical except when t
key is absurdly smalTT
Suppose, for example, we ht-vo a key of 261 possibili
or about 26.3 digits, the samu size as in simple substitution
English. This is, by any significant measure, a small key. I
can be written on a sm?:ll slip of paper, or memorized in a few-
minutes. It could be registered on 27 switches each having to;
positions or on 68 two position switches'.
Suppose further, to give the cryptanalystl every poss-
ible* advantage, thtt he constructs a electronic device to try
keys &t the rate, of one each microsecond ( perhaps ^eutomati call'
selecting from the~rosults by a X2 test for statistical signi-'
fionnce). He nr:y expect to reach the right key about half way
through, and after nn elapsed time of about ->>
2 x 60c x 24 X 365 x 10
26~ • ' ' ' ->'
— - r - 3 x X0X* years
<P w Ami. « TfiK ~ mo '/
ft
In other words, even with a smtll key compl-te trial
and error will nev^r be used in solving cryptograms, except in
the trivial case where the key is extremely small, e.g., the
caeser with only 26 possibilities, or 1.4 digits. The tri
snd error which is used so commonly in cryptograph"; is of
different sort, or is augmented by other means. If one he.
secrecy system which required complete trial and error it
be extremely safe.- Such a system would result, it appears
the original messages, all say of .1000 letters, weru a ran
selection of 2 RN from the set of all 2 RoN sequences of 1
letters. If any of the simple ciphers w«rc applied to the
it seems that little improvement over complete trial and «.
would by possible.
The methods actually- used often involve a great
-x.pt trirl and error, but in a different way- First, the tr
;,.;V ' _ ' progress from more probable to less probable hypotheses, a.
* second,, each trial disposes of a large group of keys,. not
% ■ . single one. Thus the key space may be 'divided into say 10
subsets, each containing about the srjne number of keys. B.
. at most 10 trials on= determines which subset is the corrtsc
one. This subset is then divided into several secondary s
sets end the process repeated.. Y/lth the same key size
(K • 261 - 2 x 102°) we would expect about 26 x 5 or 130 t:
as compared to 1026 by complete trial and error. The poss:
bility of choosing the most likely of th~ subsets first fo
test would improve this result evefi more. If the division:
were into two compartments (the b^st way) only 90 trials w.
be required. Wiore; s compljt^ trie! and error requires tr:
to the order of the number of k-ys, this subdividing trial
and error requires only trials to th~ order of the key siz
in r.lternetives.
This remains true even when the different keys h
different probabilities. The proper procedure then to min.
the expected number of trials is to divide the key space ix
subsets of equiprobr bility , Yftien the proper subset is det.
t.. , " . mined, this is again subdivided into equi probability subset
;. : If this process can bo continued the number of trials expec
when each division is into two subsets will be
* *- • .
r-v-.-" h- ki • - •• y'
- ■-» • *v. ... _ . log 2 . ,■ .
? yr' *- -r*v . v jf jfcch test has S possible results and each of t
fc v; corresponds to the key being in one of S equiprobabilitf ~su
rr^-. .then ., ,. .... lT^T.?^f
t&ft- ."■ • 1 |Vi ■ ... . '
Vyr,. - • * • • • n - ILL ■ : • 7 ,; v.. -
C- \;. ' - . ' log S
/
trials will bo expected. The intuitive aifnif icunco of thes^
results should be noted. In %h4 two compartment tuSt with
jquiprobibility, each test yields one altornr.tiVw of informa-
tion to the key. If the subsets hcv^ very different prob-
abilities as in testing t. single key in complete trial and er
only i snail amount of information is obtained froa th~ test.
This with 26: equiproble keys, a tost of on„ vields only
■
261-1 lnrr 26t -1 . 1 . m 1
-25
or about 10 alternatives of information. Dividing into S
equiprobability subsets m^ximiz^s the information obtained fr
each trial at log S, and the expected nuriber of trials is the
total information to be obtained, that is th~ key size, divid
by this amount ,
The question here is similar to various coin weigh-
ing problems th; t he Vo been circulated recently. A typical
example is the following: It is known that one coin in 27 is
counterfeit, and slightly lighter than the rest. A chemists
balance is available r,nd the counterfeit coin is to be isolat
by a series of weighings, '"hi t is thu lee st number of weigh-
ings to do this? The correct answer is 3, obtained by first
dividing the coins into three groups of 9 uach.. Two Of th-.se
are compered on the b: Irnce. The three possible rjsults de-
termine the set of 9 containing the counterfeit.. This s^t is
then divided into 5 subsets of 3 er.ch and the process continu
The set of coins corresponds to th^ set of keys, the counturf
coin to the correct key, and the weighing procedure to & trial
or test.
>.
This method of solution is feasible only if the key
space can be divided into e small number of subsets, with s
simple method of determining to which subset the correct key
belongs.. Started in another way. It is possible to solve for
the key bit by bit.. One does not need to assume a complete kt
in order to apply a consistency test and determine if the as-
sumption is justified - an assumption on a "part of the key
(or as to whether the key is in some large section of the key
space) can bo tested.
This is one of the greatest weaknesses of most ciph
ing systems. For example, in simple substitution, an assumpt.
on e single letter can be checked against its frequency, vari
of contact, doubles or reversals, etc.. In determining a sing-
letter the key space is reduced by 1.4 digits from th. origin
26. The same effect is seen in all th~ elementary typos of
ciDhers. In the VigenJr^, th- assumption of tvvo or thre^
letters of the key is easily chock-d by deciphering at other
points with this fragment and seeing whether clear emerges*
The compound Vigene'ro is much butter from this point of view,
if we assume a fairly large number of component periods, pro-
ducing a repetition rate larger than will be intercepted.
Her-j as many key letters ere used in enciphering each letter
as there ere periods - although this is only a fraction of the
entire keyi at JLeast e fair number of letters must be assumed
before a consistency, check can be applied*
. v ••. *•>
Our first conclusion then, regarding practical small
key cipher design, is that a considerable amount of key should
be used' in enciphering each small element of the message.
35. Statistical Uethods
' i - ,. It is possible to solve many kinds of ciphers by
statistical analysis. Consider again simple substitution.
Tha first thing a cryptographer do^s with an intercepted
cryptogram is to make a frequency count. If the cryptogram
contains say 200 letters it is safe to assume that few, if
any, letters are out of their frequency groups, this being
a division into 4 sets of well defined frequency limits. The
log of the number of keys within this limitation may be
calculated as
log 21 91 .9! 61 «= 14.28
and the simple frequency count thus reduces the key uncertainty
by 12 digits, a tremendous gain.
■
In general, e statistical attack proceeds as follows.
A certain statistic is measured on the intercepted cryptogram
2. This statistic is such that for all r easonable K it assumes
about the sane value, Sr, the value depending only on the par-
ti culnr" key 25^ that wrs used. The value thus obtained serves
to limit the possible keys» to those which would give values
of S in the neighborhood of that observed. .A statistic whicb ,
does not depend on K or which varies as much with Mas with K
is not' of velue in limiting" K» Thus in transposition ciphers ,
the frequency, count of letters gives no information about K -
every K loaves tB^s* statistic the sane. Hence one can make
no use of a frequency count in breaking transposition ciphers.
Ilore precisely one can ascribe a "solving power " to
c given statistic S» For er.ch valuu of S there will be a
conditional equivocation of the key Qg(K), the equivocation
when S has its particular value and that is all that is kn
concerning the key. The weighted mean of these values
£P(S) Qs(K)
•
gives the mean equivocation of the key y hen S is known, F
being the: c priori probability of the pcrticular value S.
key size IK I less this aean equivocation measures the "sol-
power" of S,
; >vpr In a strongly ideal cipher all statistics of the
togram are independent of the particular key used. This i:
the. measure preserving property -of TiTiZ-Von the a space o
Tj-lTk on the space mentioned abovS. -~ •
There are good and poor, statist ic's, just as ther
good and poor nethods of trial and. error. Indeed the tri:.;
error testing of hypothesis Jj a type of statistic, i-nd wh.
yiB said above regarding the .best types of trials holds ge:
- "A good statistic for solving a system must have th~ follow"
properties:
1. It -must bo simple to measure.
2. It nust depend more on the key then on the nesse t
if it is meant to solve for the key. The veriati c
with K should not mask its vrriation with K.
3. The values of the statistic that can be "resolved'
in spite of. the "fuzziness" produced by variation
in II should divide the key space into a number of
subsets of comparable probability, with the static
tic specifying the one in which the correct key
lies. The statistic should give us sizable infor-
. nation about the key,, not a tiny fraction of an
- alternative. . • ' - -"
-4* ...The infonaation.it gives nust be simple and usable
." • . - : Thus the subsets In which t bo statistic locates th
v^key rxust be of .*L simple nature in ths^key spuce.
:'- *>r< _ ' :iv '.. *' n^-ifHfcv'' . -irfA .
, Frequency count for simple substitution is an
: ,«$$opi£ uof 't. very good statistics* _ ' ^ ^Vv^:-.
. » .. _ ,^t. ... . .. . -
Two methods (other tban >rocouris^'o:^i%enl' systems
suggest themselves for frustrating a statistic^ analysis.
These we mcy cf 11 the methods of diffusion and confusion,
the method of diffusion th^ statistical structure of R whic:
leads to its redund: ncy is "dissip; ted" into long range st:
- i.e., into statistic;! structure involving long coabinati
- 96 -
?Tfide:;-
- of letters in the cryptogram. The effect here is that the
must intercept a tremendous amount of material to tie down
sturcture, since the structure is evident only in blocks o:
small individual probability. Furthermore even when he har
ficient material, the analytical work required is much gre?
since the redundancy has been diffused over a large number
individual statistics. An example of diffusion of statisti
is operating on a message m - mi, m2, m3 ..... with a "smoc
ing" operation, e^g, >v ,
s
' vn "s mn+i mod 26 , ■ - -
. - -V - • i-1 ' •-r ^K,-/V
- , , * " f . w HurlfCf. ■*■•■ ••• • " "' • - * ■ 1
adding s successive letters of the message to get a letter
^One can show that the redundancy of the y sequence is the s
as that of the m sequence, but the structure has been dissi
Thus the letter frequencies in y will be more nearly equal
« in m, the diagram frequencies also mor3 nQapiyfaqual etc,
... - deed any reversible operation which produces -one letter out
each letter in and does not have an infinite "memory" has a.
output with the sams redundancy as the input. The statisti
can never be eliminated without comwession, but they can t
spread out* •
..r .' The method of confusion is to make the relation t
the simple statistics of 3 and the simple description of K
complex and involvid one. In the case of simple substituti
was easy to describe the limitation of K imposed by the let
frequencies of 3. If the connection is very involved and c
fused the enemy can still evaluute a statistic Si say which
the key to a region of the key space. This limitation, how
is to some complex region R in the soace - folded over many
and he has a difficult time mr.king use of it, A second stc
S2 limits K still further to Rg, hence it lies in the inter,
region R1R2* but this does not help much because it is so d;
cult to determine just what 'the intersection is." .
i , 'v-v To be more precise lot us .suppose the It ey space he
oertcin "natural coordinates* kl,k2, " . k- which he .wishes
terminey. .He measure's c set of -'stati sties sijSg^^^s' anc
ere sufficients to determine the k^. However, in the method
confusion, th* equations connecting thes a sets of variables
involved and complex. We have, : s^y, -: '•^•;':'r'a~-~
fn(k1,k2,,.;,ki>).- sn,
- 97 -
NTIA1
and all the f. Involve all the k^. The cryptographer must
solve this system simultaneously - a difficult job. In the
simple "(not confused) cases the functions involve only a
small number of the k. - or at least some of these do* One
first solves the simpler equations, evaluating some of the
ki and substitutes these in the more complicated equations.
The conclusion here is that for a good ciphering
system steps should be taken either to diffuse or confuse
the redundancy (or both)- / / .
V '> ■ " ■ - "AV. .
36, The Probable Word Method, . - ' _ , . .
One of -the most powerful tools for- breaking ciphers
is the . use of prQbable words,. The probable words may-^.-J^.y
words or phrases expected in the particular message flue, tq j";
its source, or they may merely be common words or syllables
which occur in any text in the language, such r.s the; end,
tion, thrt, etc.." v i
In genera 1> the probable word method is^used as
follows* Assuming a probable word to be at some point in
the cleT, the key or r part of the key is determined* This
is used to decipher other pp. rts of the cryptogram and provide
r consistency test* If the other prr£s come out in clerr,
the resumption is justified.
There pre few of the classical type ciphers that
use a sm^ll key and can resist long under a probable word
analysis. Fr^m a considerr tion of this method v.e can frame
a test of ciphers v.hich might be called the r e id test. It
applies only to ciphers with a small key (less thr.n say 50
digits), applied to natural languages, and not using the
ideal method of gaining secrecy. The rCid test is this:
Hoy. difficult is it to determine the key or a p^rt of the
key knowing n sample of message rnd corresponding cryptogram?
Any system in v.hich this is easy cannot be very resistant,
for the cryptr.nrlyst can always make use of probable words,-
combined with trial and error, Until a consistent solution
is obtained-
- - . ' v •' .'• ' ■ ■ . : " ri -
The conditions. r>n the, size of, the k:y make the
amount of trial end error small, and .the' -condition about"
ideal systems is necessary, since these automatically give
consistency checks- The exist enoe~ of . probable words and v."*;-.-.
phrrses is implied, by the condition .of natural language a* . *
Conversely, it seems reasonable that if the key is difficult* ? '
to obtain, knowing a text :ahd Its cryptogram, then the
system should be strong. • .*"■■' '
- 98 - COlMflENTIAL
Note that this requirement by itself is not con-
tradictory to the requirements that enciphering and decipher-
ing be simple processes. Using functional notation we have
for enciphering
and for deciphering
E = f (K, I)
M - g (K, E).
Both of these may be simple operations on their arguments
without the third equation
. - K » h (M, E) • - - ■ - '
• . jg -. ■ ' , . .-
being simple* \. ^ v''"" ;-
^ • - . .3 ' :" :: ''5v
V'e may also point out In investigating a new type
of ciphering system one of the best methods^off attack is to
consider hove the key could' be determined' if a sufficient
mount of'M and E were given. -
With a small key, the work required to solve a
system, given a lerge emount of dr.ta, may be expected to be
not more thrn a few orders of magnitude greater thpn the
work required to obtain the key from a small amount of datr
when both U end E nrc known.
The same principle of confusion era be (nnd must be
used here to crer-te difficulties for the cryptanrlyst.
Given K-rn^mg ... mg end E - e, eg eQ the crypt rn^lyst
enn set up equations for the different key elements k^ kg
(nrmely the encipherings equations)* V; "
fg (n^, m2# •♦♦,m8J l£i#».*#kr>^
- 99 - ' mm lUiLUTiius — -
All is known, we assume, except the k,. Erch of thr s j equa-
tions should therefore be complex in the k., and involve
ninny of then. Otherwise the enemy en solve tho sicple om
and then the more complex ones by substitution.
From the point of view of increasing confusion, it
is desirr-ble to hive the- f^ involve several n^.t especially
if these sre not adjacent and hence less correlated. This
introduces the undesirable feature of error propagation.,
however, for then erch e, will generPlly affect several m,
in deciphering, and an error will spread to rll these..
We conclude thet much of the key should be used Ir.
an involved manner in obtaining any cryptogram letter from
the message to keep the work characteristic high* Further r
dependence on several uncorrected m. 4-s desirable,, if some
propagation of error can be , tolerated* V/e are led by all
three of the rrguments of these sections to consider "mixing
transformations,." ,
37* Mixing Trensf ormo tions
A notion that hr-s proven v^lu^ble in certain branc
of probability theory is the concept of a "mixing transforms
tion." Suppose we have a probability or measure space 0, ar.
measure preserving transformation T of the space into itself
i.e., a transformation such that the measure of a transform*
region TR is equal to the measure of the„initial region R.
The transformation is called mixing if for any function de-
fined over the space , end any region R.
n^o, J 'til) dP - J dP J f (P) dP.
T°R R O '
This means that any initial region of the space R under suc-
cessive applications of T is mixed into the entire, space &
With uniform density* In general S^R becomes, a region con-
sisting of a large number of thin i filaments spread through-
out the region..' As n increases the filaments become finer
and their density more nearly constant* v • v
An example of a mixing transf ormation is shown in
Fig. 21. Here measure is identified with Euclidean area. '
The spaoe is the 'triengle and tNp is the print \ units ■ «f
distance ab^ve point P providing this does n*>t g^ outside
the triangle* When the top of the triangle is renched a
point is transferred first to the point directly beneath,
and then over to the right en irrational fraction of the
base width. If this carries the point beyond the right edge
- 100 -
the extra distance is mersured from the left edge. -Successive
transforms of b square region ere shown in Fig. 21. For \
ve,ry lrrge the squar-. is turned into q uniform grating ot
nearly parallel thin strips covering the triangle.
A mixing transformation in this precise sense en
occur only in a spaee with on infinite number of points, for
in a finite point space the transf ormation must be periodic.
Speaking loosely, however, we can think of a mixing trans-
formation as one which distributes ?ny reasonably cohesive
region in the space fairly uniformly over the entire space.
If the first region could be described in simple terms, the
second would require very complex ones* In the case of
y~ cryptographic interest, the original region is all of a cer-
•.; tain simple statistical structure — after the mix the region
.< ' .is distributed and the structure diffused and confused*
. Go~d mixing transformations are often formed by re-
k. & " peated products of two simple non-commutating operations*.
. ' See for example the mixing of pastry dough discussed by Hopf.*
The dgugh is first rolled out into a thin slab,, then folded
over,- then' rolled, and then folded again, etc
In a good mixing transformation of a space with
natural coordinates X,, X2,. . *. ., Xg the point X. is carried
by the transformation into a point Xi, with
Xj^ ■*■ f ^ (X^ , Xg , • » » , , Xg ) i " 1 , 2 , * • • ,S
and the function* f, are complicated, involving all the
variables in a •"sensitive" way. A small variation of any one,
X3, say, changes all the XI considerably. If X„ passes throug
its range of possible variation the point XI traces a long
winding path around the space.
...
Various methods of mixing applicable to statistical
sequences of the type found in natural languages can be
-devised. One whioh lo ;ks fairly good is to follow a prelim-
inary transposition by a sequence of alternating substitutions
. '. ' J end simple linear operations, adding adjaoen^ letters mod 26
* for. example * • r ■. ..; >
Thus . >.-. '.
S*Jht r-'i- • • . • • ■ *' . . . -f i SJ rv-. - • '
H - L3ISLT ■ ; .
"where T is a transposition, X .is a linear operation* and S is
" ' - a substitution.
• .. .
*E. Hopf, On Causr-lity,. Statistics and Probability, Journol ol
. / Mrth* and Physios, V.13, pp. 51-102, 1934.
< v
i ■a
- 101 -
38. Ciphers of the Type 1\HS.
1 1
Suppose that H is r good mixing transformation *
can be applied to sequences of letters and thst T. find S.
any two simple families of t ran s formations , i.e., two J
ciphers 4 which may be the same.. For concreteness we m^y 1
of them as both simple substitutions..
It appears that the cipher THS.will be r very g:
ciphering system from the standpoint- of its work chnrnctei
In the first place it is clcr on reviewing our arguments
statistical methods that no simple statistics will give ir
tion about the key - any significant . statistics derived fr
must be of e highly involved end very sensitive type - the
dundpncy has been both diffused and- confused by the mixing
. . Also probable words led to e complex system of equations
Ing all parts of the key {when the mix is -good), which mu
.solved simultaneously,. The bad features of such a system
v v •• - :* propagation of errors and complexity of operations, both c
/ • V: which get worse ns the mixing of H gets better.
It is interesting to note that if the cipher T i
omitted the rempining system is similar to S nn1 thus no
stronger. The enemy merely "unmixes" the cryptogram by
, plication of H~l and then solves.. If S is omitted the re-
maining system is much stronger th*n T alone if the mix is
but still not comparable to THS.
The bnslc principle here of simple ciphers sepa
by a mixing transformation can of course be extended. For
example one could use
'S, ' TkHiSjH2Rl
«$& . . * - -, • . ' . >•*.»'«••
•• >«- ' JIth two mlxes and three simple ciphers., One can also sim
by using the same ciphers, and even the same keys (inner
product) ns well as the same fixing transformations* - This
• ;*jr.. might well simplify the mechanization of such systems^ "
••/, ■ The mixing transformation which separates the t\
> -N {or more) appearances of the key acts as a kind of . barrier
/>. ti;; J** enemy — it is easy to oarry a* known element over this
barrier but an unknown (the key) does not go easily,
«... .... , By supplying two sets of -unknowns, the key for £
the key for T, and separating them by the mixing transform'
H we have "tangled" the unknowns together in r way thrt m«V
solution very difficult,
Although systems constructed on this principle
wpuld be extremely safe they possess one grave disadvantage.
If the mix is good then the propagation of errors is b^d.
A transmission error of one letter v.ill affect several let-
ters on deciphering*
.
39. The C omi.o und V ige neVe
In the compound Vigenere severcl keys of length d.
<3gf ..* f dg are written under the message and added to it
modulo 26 to obtain the cryptogram, The 'result is 8 Vigenere
with key of special type,' -whose repetition is of period d „ the
least oommon multiple of cU, <5„, dg. If we h'-'ve three
keys of periods £, 3, 5 thl total period is 50 nod the total
key size (2+3+5) x 1,41 - 14,1 digits. The situation is then
M ' al ^ ^ m4 m5 m6 -
*
H ~\ a2 al aE al kZ
K2 - bx b2 b3 bx b2 b3
K3 - Cl C2 C3 C4 C5 Cl
E *" el e2 e3 e4 e5 e6
ith .
el * ^1 4 al + bl + cl
e2 " ml * a2 4 bl 4 c2
etc«
If we assume M nnd E known then, letting »= r m(
s V a. + b,. 0,-h, a, + b3 ♦ c, - h5
' ' " ' ' ■ + *2 * °2 " h2 Ql 4 bl 4 °2 • V .
Rl * b3 * c3 " h3 ' R2 * c3 ,r W
. . . Q2 * bl 4 °4 " *4 al + b3 4 C4 " b9
Ql + b2 + C5 * h5 C2 + bl + C5 " h10
These equations are easily solved for the key, although not as
easily as in the simple Vigenero or othor sinple ciphers. As
the number of constituent periods increases the solution be-
comes more involved and time consuming. In any case wo have
a system of simultaneous equations each involving S of the
s
total of B^dj^ unknowns. The unicity point will occur at abou
2B letters and if soveral tines this amount of material is in-
tercepted no groat difficulty, should be encountered in breakin
the cipher, providing S is not mora than say 6" or 8. With the
first 9 primes as periods we have a key size of 100 letters or
about 141 digits, the unicity distance is about 200 letters an
the key does not repeat for 223,092,870 letters. This systen,
although much better than such methods as simple substitution,
transposition and simple Vigenero with equivalent key size,'
does not utilize the available key fully in making the cryptV
analyst work for the solution. The equations only involve 3
of the B key unknowns and those in a simple fashion* The
equations easily oombine and reduce to eliminate unknowns. If
a large amount of material is available, compared to the unicii
distance, particular sets of equations can be combined to
eliminate unknowns very easily. The system possesses the inpo:
advantage, however, of not expanding errors. One incorrect
letter of cryptogram produces one incorrect letter of decipher*,
text.
..
By relatively simple changes this system could be
strengthened considerably. If tho equations for the key
elements (with M and E known) could be made into higher degree
equations rather than linear ones the difficulty of solution
would increase tremendously. This could easily be done in
a mechanical device by successive multiplications (Mod 26)
of tho key letters according to some prearranged schome,
*
40 » Incompatablllty of the Criteria for Good Systems
Tho five criteria for good socrccy systems given in
seot ion 12 appear to havo a certain inconpatability when ap- -
plied to a natural language with its complicated statistical
structure. With artificial languages having a simple statis-
tical structure it is 'possible to satisfy all requirements
♦simultaneously, by means of the ideal type ciphers. In natural
languages It seems that a compromise must bo made and tho
valuations balanced against one another with a view toward
the particular application.
If any one of the five criteria is '"roppec* , the
other four crn be s?itisfied fr.irly well, r.s the following
examples show.
1. If we omit the first requirement (amount of secrec
any simple cipher such os. simple substitution will
In the extreme case of omitting this condition com-
pletely, no cipher at fll is required end one send.
. the clef.ri
2. If the size of the key is not limited the Vernam
system can be used.
3. If complexity of operation is not limited., various
'•extremely complicated types of enciphering process
cen be used* The modified compound Vigenere descr
above with. many different periods compounded is f e :
satisfactory as an example here, although it falls
down somewhat on the key size condition. Ideal syf
"and enciphered codes are also frir examples althout
not too good from the propagation of error point o:
view.
4i If we omit the propagation of error condition syst
- of the type THS would be very good, although sonew:
complice tad.
5. If, we allow lr.rge expansion of message, vr.rious sy.-
are easily devised where the "correct" message is :
with many "incorrect" ones (misinf ormrtlon) . The \
determines which of these is correct.
• A rough argument for the incompatibility of the. :
conditions may be given as follows.
> ' '
■ ' '* : From condition 5, secrecy systems essentially a s
Studied In this paper must be used; i.e., no great use of r.
etci Perfect and ideal systems are excluded by condition c
rg^0&aMJHr 3 and 4, respectively. The high secrecy required- bj
>'^;"^^^flWi«'*th«n*TD<3tf» -£rm a high work characteristic, not from a
^ high equivocation. characteristic , If the key is small, the
> '_' ^..^f^-r^: system' simple, and the errors do not propagate^ probable wc
methods w 11}. generally solve the system fairly easily, sine
we then have a' fairly simple .-system of equations for the ke
This" reasoning is too vague to be conclusive, but
general idea seems quite reasonable. Perhaps if the varioi.
criteria could be given quantitative significance, some sot
an exchange equation could be found involving them and giv:
the best physically compatible sets of values. The two mo:
- t difficult to measure numerically are the complexity of opei
tions, end the complexity of statistical structure of the
• language . ,
■
Appendix 1
Deduction of - I pj log pi
It will be shown that the meusure of choice -
£ Pi. log Pi is a logical consequence of three quite reasone
assumptions about the desired properties of such a measure.
The three assumptions are:
V (1) There exists a function C(plt p2, pn)
uous in the p^, measuring the amount of "choice" when there
n possibilities with probabilities p^ ,
/•-. ' • .. ' . ' •
. <2) , C has the property that If a given choice be
broken aown into two successive choices the. total amount of
choice, is the weighted sum of the individual choices* . For
example, suppose the choice is from 4 possibilities A, B, C
with probabilities Yl, .2, «4U . .This can be broken down
a preliminary choice hetween.the pair A, B and the pair C,
Pair A, B has a total probability .1 + .2 « .3 and pair c,
probability .3 + .4 « .7. If pair A, B is chosen a second
between A and B must be made with probabilities -*1 « 1
.1 + .2 Z
42 2
V " If Pair c» D is chosen a second choice betwee
•* *
and D must be made with probabilities ^ and * , Thus brok
down we have a preliminary amount of choice C (.3, ,7) end
of the time a secondary choice of c (± f 2 j while .7 of th
time the secondary choice is C (2 . Our condition req
that the total choice C (.1, .2, -3, t4) be the same as the
, weighted sum of the different choices when decomposed, weig
in accordance with the frequency of occurrence. Thus we re
in this case C ,2, .3, .4) « C (.3, .7) + ,3.C (- , - )
;f^^!-, If .A(n) ? c (I #. i,.!*.*. .» the choice
when there are n equally likely possibilities, then A (n) i;
monotdnio Increasing in n. i .
Theoreaj . Under these three assumptions
(•■••» - - • _
C (PI, P2, , Pn).88 - K£ Pi log pi .
where K is a positive constant.
- 106 -
From condition (2) we can decompose a choice from equall;
likely possibilities into a series of m choices each from s
equally likely possibilities and obtain
A (S111) ■ m A(s)
Similarly
;. (tn) - n A(t)
We can choose n arbitrarily large and find an m to satisfy
S*< t*< S01 ■* 1
Thus, taking logarithms and dividing by n log S,
5 £ < log t V _m + ±
'"log s- . , « j st lSTs.|-< e
where* is arbitrarily small*
Now from the monotonic property of A(n)
A(SP) < A(tn) < AO* + 1)
m a(s) < nA(t) < (m + 1) A(S)
Hence, dividing by nA(S),
m s t ) m 1
n — MS) — n b
• - m \k"
- I < 2 e A{t) • -K log t
"{BY log S I *~
where K must be positive to setisfy (3),
Now suppose we have a choice from n possibilities with comme
surable probabilities p^ * where the are integers*
can break down a choice f rom £n4 possibilities into a choice
f roa possibilities Tvith probabilities pi* »>pn and then,, if
the ith was chosen,, a choice from ni with equal probabilitie
Using condition 2 again, wef equate the total choice from £ni
as computed by two methods
K log Eni - c (pi-, , Pn) + K£ Pi log nj_
- 107 -
Hence
C - K [E pi log I ni " E pi log ni]
■ * K 2 pi log -SL « -K £ Pi log pi
If the pi are incommeasureble, they-may be approximated by
rationale and the same expression must hold by our continuity,
mce and amounts to the
choice of a unit of meesure,
m
/in
i
- 108 - srfsrr
Appendix 2
proof of Theorem 4
Select any message Mi and group together all crypto-
grams that can be obtained from Mi by an enciphering operation
Ti# Let this class of cryptograms be c{. Group with Mi all
Mg that can be obtained from Mi by Tj^TjMlf and call this class
Ox* The same ci would" be obtained if we started with any other
M in Ci since : ";.\. •'
• - - : ; ■ I i . if, & TsTj^ki % : %iUmm.. ' . ■
.2.,: , ; • . •;. ^^aj^;1^-"
Similarly the same Ci would be obtained; :>r >
- *
Choosing &n M*.flf any exist) not , in Ci.we construct i-
G2 and Ce in the same way* .'Thus ^We obtain the residue* classy
with properties (1) and (2). Let Mi and M2 be in Ci and suppose
M2 - T2 Ti-1 Mi
■
If El is in Ci and oen be obtained from Mi by
Ei - \ Ux -Tp Mx - Mlr
then
El * ^ T2 Tl M2 " Tp T2X Tl M2 " ♦ m '
»*
" ^ M2 - ^ «2
Thus each Mi in Ci transforms into Ei by the same number of keys.
Similarly each Ei in c{ is obtained from any M in Ci by the same
number of keys. It follows that this .number of keys is a divisor
k ' , . of the total number of key* and hence we have properties' (3) and . ..
.. * ^- o< *
. . - •••• • I...
... ,* S6*r* . 4.:? *
" ; 1* •.
. i ' .— .4 „•
109 -
^nnNTTTPnTiT
x 3
Equivocation of Message for Random Cipher
As before let Mi ... Ms be high probability mes
and Ms+l ••«» Mu have zero probability. Let P(mi, m) be
probability of just mi lines going from a particular E, s
to a particular high probability M, say Mi, with a total
lines to all high probability M. Then
...
.-..!-■ ft
_,„ (k) (m) (i)»l (s;i)"i-i»1(1.s)
The probability of intercepting an E with m lines t
bility M's la:^ >
k-n
' ■ -
The Q(M) expected can be thought of as contributed to by
various Mi .in the high probability group. Thus Ml contri
. mi mi , m
- log — = ■ —i log —
m xue m m 6 mi
if there are mi lines to Mi and a total of m to high pro^
M's. The expected Q is then
(MM) - a S miEm PCj.m) §j SL log S_
The factor H sums over the various Ei and the S sums ovei
different Ml,(i, l>t s) • Hence,
Q(M) - I £ P(mi,m) mi [ log m * log mj
the term y
i - v.- ■ ,. ■
V
E P (mi,m) mx
summed on mi* gives the expected mi, when m lines^go to h
probability. Mgt 1*©,, m/a, Henoath'e first term is
• •* * •»:.-> fx*. ■*'■';
JL £ m P (m) log m * Q(K)
m
by our previous work. The second term is
• JSP (mj., m) mi log mi
If the expected mi is «1 this term is small since it vanishes
for mi ■ 0 or 1. The expected mi is k/H» Thus beyond this
point Q,(M) approaches closely to Q,(K) • The point in question
is where JK| • |Mpf - RqN •
or
IK
If the expected »1 the log mi can be taken out as log Hi «*
log k/Hi and we have' , - :
log =y £ P>j
' ' ^ -log § - }Mo1 r .|K!:^-r •
In' this "region then • - V " '. ' ; "y
Q(1C) • |M0| - id + d(K)
but here Q(K) - ]k| - |M0| + : • Jill, and therefore
q(M) - |m[ - RN . - '
In the transition region Ei is about 1 and Iff will in
ordinary cases be very large. It is admissable then to replace
?(mi; m) by P(mi) , since this will not depend on m to any extent
except for values of m of very small probability. Thus we obtain
for this region
iiU) - - 3 £ p(mi) mi log
The "sum has the same "form as our expression for Q{K) but with
l/H In place of s/H» The calculations for Q(K) can be used,
therefore % with only a change of '< the^U scale byja factor of
. '•' ' '"• ^>-"~" ^"'ft *" •' ' i. ' J}'*'
- Ill -
. .,"■■»
v- ■
Appendix 4
Key Appearance in Simple substitution with Independent Le-
If successive letters are chosen independently e
the different ' letters have probabilities Pi P2 Ps» we
calculate the expected number of different letters when N
letters have been intercepted. ; It is,.
:,^,L, ,i IW - s - e (l - Pi)N ;
t
To prove thi*« * iiaklte«iri^'*^Klbl« sequences of N le
written down, each wifch'^a frequency corresponding to its ]
bility, giving a total ^of aay A sequences*.. Letter 1 does
appear in (1 * Pi)N A of thesej letter E does not appear i
(1 - P2)N A etc. Therefore/ "the total number of letters r
from sequences is
AMI" Pi)N
Dividing by A gives us by definition the expected number t
missing letters from a random sequence, E(l - p«)N, rphe j
of different letters expected in a sequence is the total :
of letters S minus this, giving the desired result.
If all the pj. are equal this reduces to S - S(l
ah exponential approach to S« In the general case there i
series of exponentials with different time constants, cor:
sponding to different p^, which are added to give «L(N).
With the frequencies of normal English used for
p^t we' obtain the curve shown in Fig* 25, along with ah e:
mental ourve. The small discrepancy can be attributed to
influences of nearby letters* (IaJBnglish- there is less tc
-to double letters than there would be if the letters were
pendent but" with' the same probabilities. For English the
.bility of a doubled diagram is , ^
i*K.'«Mu • . ••' •- • ■ -k. J: .. * h'S , "
r^y 'i'^i*^^- *->.. \v. £ P(i* i) " • 0315
. * while if letters were independent it would be v
.-. ^ - » -,:■■■:*■;{ p ■ ; ■ - * *. • •> • ' - -• U.
E pj * ,0670.
.appendix 5
A Theoretical Case Where All Invariant Statistics of E Are
Independent of K.
By an invariant statistic of e sequence of letters
S »',».., m_2 niQ m^ m2 • m3 , we will mean r statistic
which is averaged along the length of the sequence E» More
precisely a statistic of the form:,
Lim i — (F(E_b)*-»- ♦+ F(E„i)+r{E) ♦ F(Et) + F (E2J+...+ F(En)
n -co (2n+l) ( ^ —
.... , . ■ ' . 4 * ".' ■ ■ ... . • ■ -Vi?, :
' '■■ .' . , * , ... " ' • ,. . " . - _ ••
where F is any function whose argument Is a possible sequence , and
E±a is the sequence E shifted N letters to the right -or loft.
Such statistics as the relative frequency of a given letter, of,
a given n-gram, transition frequencies, and frequencies with
whioh letter i is followed by letter i at e distance n are all
invariant.
• •• •
We will describe a system in which every invariant
statistic which the cryptanelyst can construct from the (infinte)
intercepted E is independent of both K and M, and thus gives no
information to him. This effect and still more occurs with the
ideal ciphers of course, but here it is obtained independently of
the original message statistics and without any matching of the
cipher to the language.
Let N be a "random" sequence of letters;
N * »•» n_2 n-i n0 n^ n2 us ...
this is supposedly a known sequenoe (to the enemy) and thus a
part of the system, not of the key. Apply eny simple cipher to
the message and then add N letter by letter to the result {mod
B6)« The ♦•sum'* is the enciphered message* 'it is evident that
any Invariant statistic oa S will be (with probability 1) -the
same.es that for a rendom sequence* Hence it is Independent
of both K and M» ; x •
We need hardly add that such a system is easily
broken ~the enemy merely subtracts N from E and then solves
the simple residual cipher* which 'may often be done with
invariant statistics, >
Appendix 6
Maximum Repetition Rate in Compound Systems for a Given To-
We consider briefly the question of how to arran-
component periods in a compound Vigene're or Transposition i
to obtain the longest period for a given total key size,
component periods are Px, P2,/t*» Sg JLt is clear that they
b'e co prime. Otherwise the total key, which is LPif could \
duoed without changing the period, which is the least comm;
multiple of the Pi, merely by deleting a factor which appet
several o'f. the P^ from all but one/ Also each p must be e
of a prime, for if it contains two primes, it can be divide
these parts, reducing the key and not affecting the period,
the component periods are selections from the series of pri
and powers of prime sj . .
4& 2„ 3, 4, 5, 7, 8, 9\ )^:XZ4?m:i7'f, 19, 23,. 25,. 27,
the seleotion being pairwise ooprimeV
It appears from empirical evidence that the best
of component periods, for a given total size S is found by t
following process,
1. Determine the largest M such that Ipj<S where the
are the primes in increasing order^ This is the
maximum number of periods where the periods are c
prime, end is the number of periods to be used.
2. Choose from the sequence A, M elements, consecuti
except for the fact that no prime is represented
than once, the M elements being as great as possi
with aum <S#
3. If the aum is <s move as many as possible of the
elements in this block up -a notch in the sequence
v still satisfying .the conditions .on the sum and co
' ■ mality , ■ : i r •'
4. Repeat 3 to either part of the original block if
, , * :." sible •*• "This process eventually ends and apparent
gives', the proper decomposition*
■ ; *-':~>!'":
r-?. For example with 8 » 50^ the .sum of the first
primes is 41, of the first 7 is 58. Hence 6 peri
will be used. We .have
• • 11 + 9, + 8+ 7+ £ + 3w43
13 + 11 +9 + 8 + ^7 + 5 * 53
hence we start with the block 11, 9, 8. 7 5 3
to6givl * elemants 11» 9» 8' 7.can be up a
13+ 11 +9+8+5+3-49
Nj further improvement seems possible, we obtain
F- 13X 11 x 9 x 8x 8 x 3 * 154, 440
The products and sums of the first n prime's are given below
n 1 £ 3 4 5 ... 6 7 8
pn , 2 3 5 7 11 13 17 19
Sum 2 ■ 5 10 17 28 " , 41 * 58 77
Product 2 6 30 210 2310 30030 510510 9699590' 22309!
C. E. SHANNON
Att.
Figures .1-25.
■
ENEMY
CRYPTANALYST
E
MESSAGE
SOURCE
ME55AG
M
ENCIPHERER
T.
CRYPTOGRAM
DECIPHERER
MESSAGE
V
M
KEY
K
KEY
SOURCE
KEY K
FIG. 6
* >-
—
T"1
FIG. 8
ME SSAGE
RESIDUE
CLASSES
M
M
CRYPTOGRAM
RESIDUE
CLASSES
Cj
M,
C3 [ M7
] c;
PURE SYSTEM
FIG. 10
CALCULATION OF Q CURVES
FIG. 16
N
FIG. 19
CG^RD^OL
STRONGLY IDEAL Q- \*\
N - NUMBER OF LETTERS
IDEAL CHARACTERISTICS
FIG. 20
FIG. 2 2
FIG. 23
September 19 , l*4&-ll£S-CX3-yO
Introduction.
la elasaioel ae&aanios one considers situations
where the state of a syatoa is described bj i Mt of numbers,
tie coordinated of the phaae space of the system, and the
dynamical behavior la controlled by a eat of ordinary differ-
antlal equations. Suca a ays tea is entirely determinate; the
future ia completely apeolfiad by toe preaent state aad the
dynamical equations, alnoe these differential equations have,
ia general, a unique eolation peas lag through a gives point.
In other branches of physics (host flow, brown! an
motion, diffusion etc) there are situations which saa ha called
completely statistical* The path of a particle of gas la
described only statistically aad no/ determinate or mesa behsrior
ocoars. In this case oae studies the flow of probability which
ia described by a partial differential equation of the heat
flow typo.
the present stomoraadnm J I sens sea a partial diff area-
tlal equation ia which both effects occur— there is a definite
•mean" motion of a system determinate ia character, carrying
its rcpresentatlTC point through phase space la the classical
manner with a superimposed statistical effect continually per-
turbing it from this path.
• a -
2a suoa a mm toe futars coordinates of tbs aysteas
•uuot bo precisely predicted; oaly « probability distributioa
fuaoUoa oaa be deterained for tha future tiae aaose *alae
times tli« volww eleaeat dT is tae probability tbet tae ayatea
will m la ibt wolaa* eleaent dr around tae poiat la question.
For a snort tlaa tne ays tea is substantially deteralnata , tbs
dlatribatloa being concentrated around a point whleb morm* ao-
aordlau to tae determinate part of tae equation. As tba statis-
tical off acta ooaa into play this distribution broadens oat aad
la general approaabea a Halting distributioa anion ia indepen-
dent of tbe initial atato of tbs systeau
Xa eoac rasps ota taa situation ia stalls* to tbet la
quantua aeebaalsa, wbere aysteas are dsseribad only by probnbili-
tiea (or wore praaisaiy by wm foaatlons whose squared aaplitudas
ara probabilities*. Tbara is tais difference howeTcr; ia quantum
mechanics area tae initial state aaaaot be preoiaely deseribed
due to tbs aaeertaiaty priaeiple. Coajaeate ▼eriablea aaaaot
both be measured elaultaaeousiy vita exactness. Za tae aysteas
we consider Hera there are asaaaed to be no dlffioulUes of this
aeture— all ooor dins tae aaa be aiaaltaaeoualr aad preeiaely
measured, tais eorrespoads to tae differ ease la tae fundamental
equation from that of qusataa Aeehsaioe~Sebm,edlagoits equation is
for the wave fuaotion * , walla tae equation considered bare deals
directly «itfc tae probability density, mas the present work: is
adapted to "ifolar" statistical situations.
Ihln sort of analysis any *>* expected to apply to
many pr obi eat where the actual situation Is quits explicated
but a partial theoretical aaalysic is possible, this partial an-
alysis Is used for the determinate part of tbs c;u»tioa, and
the other complex disturbing effects treated statistically,
each situstions may occur la economics, sociology, history, eta.
as veil as in many engineering and physios J. problems.
G. S. Stlbits la a series of meaoraada bas considered
a similar problem la aonaeotioa with the stability of a periodically
closed servo ays tea. la ale case the phase space of the system
oonslsted of a sat of discrete points, and uie fundamental
equation is a difference equation, la the case considered here
(which was suggested by Stlbits* eora) the variables are continuous
and a differential equation is involved. S
Xa a Aataraiaate *ja\*m aita aa a dlaaaaloaai paaaa
OMi, nacaa aotioa la iMtriM bar diffaroatial asuatioaa, *• aa*a
jgi • fYu\ **, .... **) 1 * X#* a <D
vbara taa x* ara ©oordLoate* la taa paaaa apaea *ad t ia tin*.
If aa a tart wita * probability diatributioa of poiat* ia paaoa apaoa
.... **, t)
giving taa probability daaalty ia tsa differ aatiai rain** «lta«at
about at1. .... a* at tiaa t, taia dlatributfcm cfaaa«f>a adta tin*.
■ *
lt» utloa la 4»»orll>»a b» tM ftrUH 41ff«r«sU«i •}u»Uoa
or ia taaaor aotatioa
/
Taia ia oTidoat If »• taia* of ? aa a fluid daaaity uaoaa Yaloaity
flald ia f4.
So* auppoaa taat aa t&* raaraaeautiva poiat of too
ayataa aovaa about taa pftaao apaaa it ia ooatinaaily aubjaat to
aaOl dlatorb&aeaa, walah ar« of a probability ty?a« tlaia taa
ayataa taada to folio* taa aoluUoa of (1) but ie aoatiaaally
balac dlaturbad by taa probability affeota, walca amy bo taouaat
of aa aoaathlag liJca aolaaular aoUiaioaa of taa aurrouadia* ama
m % m
oa a aorta* partlelo. *o art Ui«rtitt4 la taa lioltla* •*»•
abort taa dltturbiat; tffoota are wp rapid tout T*rj aaall. If
we eeeuao that taa &ata*aeaee 1* aa»o«taeottt aaa Isotx-oplt,
tfela eta bt rtpreeeate* ay as afldltloaal tara la taa equation of
tao aeet flow typo
K?*r\
Za tao aort gen*?el oaoo ear tela dlreetloa* 007 00 jr of erred, aad
oortalo reslona may aave ereattr partarbatloa effaote« taus taere
•111 generally b« * esaU ellpasld of probability about oaoa point.
aa4 o oorroopoflcioa poeltlve aefiaite ejiadrntio for*
defined erer toe paa*e apeee* Tbli form deeerlbee tao Xoeal
•tetletleal perturbine effeets, for eeea point,
tao equation tata enauaee tao form
Talt partial differential eonetioa «©wae tao flo* of probability
la tao panee tpeee, Utb oa eaeeable of eyatene dlatribated at
t m 0 aoooraUa to F0(al)
tao attribution at a la tar tlao t^ la tao eolation of (1) for
Tao equation (1) la llaoar aad of parabulia typo (la t).
In taa x* it le elliptleel, aiaea a1^ la fOaltlra definite.
m % m
Tao total .robubiUtj la tU jftaao 0j*«* *«asia o^staai, for if
vt lot
/ (a1* 5^ ♦ *« • «
tfco latogral boia* ow o * xffUi*aUy Xar*o oarfaoo, ud ^ t&o
volt awaalt
Xf a1* to aosltivo oafiaito «o4 oota a1** aa*
ar« ooatUwotui la tao aaaao aaaoo turn 4iatri»«tioa v approaaM
a ual$*o Halt as t HMK ma Halt la alia«r s«o owr*a«*ot
tao pNfesalUty JOtaroaUa* to Uf laltf o* a «o*iatt« Uaitiag 4i#-
tritouoa r* alta .
CM
ft* aay %•
f*a iiaitiaa alatritottloa am*t aatlofjr tao olU#tioal
ofuatloa ottaiaoa ay oottla* || • 0,
To nuom tact the aiitrihution epproaohea a Halt let
P1 and ?g ee two different solution* of ID. Titea the dif-
ference o, - ?A - P^ al«o satiafia* the equation aad ^ la
poaltive la oaa region B and negative la tae raaaladar at tae
apace. Consider tae cuani-ity
U auat deer ease for
where S la tae surface of tae reeioa B aad T la tae outward
Telooity of tale ear face. Since Q vanishes an the surface, tae
aeooad tern la aero, aad tae first la
Toluae iategrale of diYeraaaeea aad traaafora aj tae
i
usual theorems lato surface integrale
V
tae aeooad tera age la vanishes alace Q - 0 on S. la tae first
term «A la la tae direction of ^ a© at any point we have
< 0
Tims a aj initial distribution
?a «4 ?j H dearaaaia«.
•BprMMM t*» MM Xiait.
i
• I I*
It «^ is SeuiUiMOOS, *ftt tots ft <U»«aatHuiUyt
PwiH b#> o&u lienors, sad tfcs ▼sotor SUE ftl— aa i t— tsassj »
Ths saouat of tiiia di««oatiault/ Is £U «& fcy
ft1* - ?j) • - If* - ?*) »
*frtr« tht b***sd «a4 uafcsjrr »d l«n«r* ***** ts> ti»« two tide*
of t&« dltesoiiiuUt/. Tims
SMMyiftlsai Aft Mm *»a i1£m o# s*sft i 1 nana** ****** g>gj -
Xft tSM sUpisst Oft« &l»«ASiS*%l •*»* wft fcm
If wo «tort with ft «opiko* of prooaoilitr ioaaUaoa
at oao point, ta« I— tllato aoaowiar aaa bo aaaarlaoa la oittjOo
tor a*, aoar talt poUt wa **r ohaaao a1* aad f1 to bo aoaotaat.
Do» to tao f1 tao aolxo otartt «crln« vita a ▼•lojUy/*, 9111141
too pro»«oUltr tors a1* •pr«*de it out. If wo oottt wUtt«i
fro* af to
wo aooo -
* ' „ „. - "'
aod too •quatioa boaoaoa
taio ia tha o^uatioa far aoat flaw la aa aaiootropla Bodlua.
Thai ia ftao y* aooraiooto too «»i*o dlffaooa out lata a mwu&m
al»%rlb*tlaa *ita qoaArotU form a**| for th« firot afcort iatorroi
of tiaa
waoro A. « it tao laroroa fora of a1*
feliaa Toioauy rial* gaj aom^aaaaoaa at*u«ti«ai .wta.
Om portioalo? mm of la tor tot 1* ttei la w&iaa
is tUo opooo. ?at a oao a&aooslaaal aaaaa opoco,tfeo a$uatlaa U)
taaa aaoaaao ta« faxa
A coaoxal solution far tola o*§o &«s *soa foa&u It a*? *o dosaria aa
*a mxoi>a* It wae laltlol 41*t*iteatl©a i» a s foactioa, aa taa
sjrataa (or 0^aeabJL«) ia fcaooo to aaaa a daflalta talus at x at
t * 0, say P$ taaa at \± taa diatribe Uoa is aoraal* ?ao saatax
aM^a aa^MP ^^^W^ft^^rd IsV^^^aa^aV^^Oj^ ^9 s^-$ jjj^L^WW^
Taus taa attn £ oaroaaas alaaa, taa ium suits aa taa aystoa aaaid
follow am taa atatiattaal sff oata aasaat* Hm tarlaaaa a*
iaoraaaoa axyaaaatiaUy to a Ualtia* taiaa a/a aita aalf taa tlaa
to ay ova taat taia la taa aalatlaa it la oaly aaaosaojy
to saastitats la taa oqoatiea (*) , k* t —a* too tiatrisatloa
approaafcaa a normal aao saatarad aa ««ro ultn a* « a/a*
M • |U - of*)
«* » $ (1 • O****)
«iu oa oroitrarr iaitioi aiotritaUoo ?aU) too oolottoo ono bo
written *• ma mte*r&l ««lo« U&* aotooo of lu^iiUm of keo*
flow 9robl«gt»«
• / **m *
foe eeoe teaerol rooolto aoX4 la toe I aiooaeioaal
I*hi wh*$i it i ltft»»y fere *&d e^ 1a eooetnat* A *OollEO#
of probability eroo&eaa iAte o oorool Aletrleotleo* toe ooefte*
folio* la* tfit dtlsrslMU trejeetery oad toe qooArotlO for*
vfeloh tekeo toe jtliot of the etaoaor4 eOriatloo toMMNOooi eat*
oeoeatioUy towt o eef Ulte limit. *ae eveloeties of too
e one tea to io obob aero eoopUeat** 1* tale eeoe oeeew, ftoe
eeootlooe for too fiaal aietrloaUoe oro *i*eo io too ejeeodis«
Xt la t&t oao Alaoaoloaal llaaa* aaoo «• rtwt alta a
aoxaal 4lat*l*atioa aaatoroa oa ao*o aita a* • £ , tao distriOuUe*
hm am ttftttjr alta t&« Xoxm. Aa io&iTi&ual oyttoa oxoaotaa
•totlotioal aoUaa aooot aoro aaa tao oaaaablo of »jst*m* prodoooo
aa oaaoaalo of tiao oarloa. Tail mmiU* aaa b« oooa to ao
oaultaloat to taoraal aoiao waiea aaa oooa p*»»ed tirou^a a t Utor
with troa»f«r aaaxaoterlotla
loa&lag to a po»or opaotrua for ta* aoloo
To aaow tola, tao aatoaoxrolatioa aa/ oa o*icul*t«a, Urotoaa
vaooo vaXuo at t • 0 !• P aato a aoraai distribution oaatataA
m * t^ ia
Aiotriootioa at t * 4 la aoeraal vita a§ • J .
aaA tala ia too autoo jxrolatioa.
too power apootnta la tao laavia* taraaafon at aula
M
mil
cystic ^^^^^^^^oa -^x .-^n..
4
ft • JLfftf*} ft ♦ *(*) F)
#% OX 9*
mix) t 0. la **• »t4»4y «t*t«
*UJ f* ♦ *(x) * • 0
twadBi ?, 0 «* x «*» ± • * o
*U) 1 fix) p • o
I * 1 1*1
A 1» A«t«ralA*4 V *&• •o&AiUaa |p
ttmMi it is *•*•*»•*? /tlx) ii
fix) »> •
f (x) • x< •
-4.
• IS*
»t obt&U ft* **• ma •tatloattry oolutioa
•V1* - ' .
^ s-*M
- .«
of «x?oa«aUftl« 6««?«ftftl&£ lot»4 * «.
*&6 I* wtwttsl
fte satisfy dp • o »• tfc*
this v««>1ym
•a* *1m»
DATA SMOOTHING AND PREDICTION IN FIRE-CONTROL SYSTEMS
By R. R. Rlackman, H. W. Rode, and
C. E. Shannon ■
THE problem of data smoothing in fire con- distant airplanes. Suppose, for example, that
trol arises because observations of target in observing the target's position we make two
positions are never completely accurate. If the errors of opposite sign and a second apart, of
target is located by radar, for example, we may 25 yards each. Then the apparent motion of
expect errors in range running from perhaps the airplane is in error by 50 yards per second.
10 to 50 yards in typical cases. Angular errors Since the time of flight of an antiaircraft shell
may vary from perhaps one to several mils,
corresponding at representative ranges, to
yardage errors about equal to those mentioned
for range. Similar figures might be cited for
the errors involved in optical tracking by vari-
ous devices. Evidently these errors in observa-
tion will generate corresponding errors in the
final aiming orders delivered by the fire-control
system.
A data-smoothing device is a means for mini-
mizing the consequences of observational er-
rors by, in effect, averaging the results of ob-
servations taken over a period of time. The
simplest example of data smoothing is fur-
nished by artillery fire at a fixed land target.
Here the principal parameter is the range to
the target. While individual determinations of
the range may be somewhat in error, a reliable
in reaching its target may be as high as 80
seconds or more, such an error might produce
a miss of the order of 1 mile. It is clear that
in any comparable situation the effect of ob-
servational errors in determining the target
rate will be much greater than the position er-
ror alone would suggest, and the function of
the data-smoothing network in averaging the
data so that even moderately reliable rates can
be obtained as a basis for prediction becomes
a critically important one.
Aside from magnifying the consequences of
small errors in target position, the motion of
the target complicates the data-smoothing
problem in two other respects. The first is the
fact that it gives us only a brief time in which
to obtain suitable firing orders. The total en-
gagement is likely to last for only a brief time,
estimate can ordinarily be obtained by taking and in any case it is necessary to make use of
the simple average of a number of such ob
servations. This example, however, is scarcely
a representative one for problems in data
smoothing generally. The errors involved are
small and the averaging process is an elemen-
tary one. Moreover, the data-smoothing proc-
ess is not of very decisive importance in any
the data before the target has time to do some-
thing different. Thus the averaging process
cannot take too long. The second complication
results from the fact that the true target posi-
tion is an unknown function of time rather
than a mere constant. Thus many more possi-
bilities are open than would be the case with
case, since any errors which may exist in the fixed targets, and the problem of averaging
estimated range can normally be wiped out
merely by observing the results of a few trial
shots.
More representative problems in data
smoothing arise when we deal with a moving
target. In this case errors in observational
data may be much more serious, since they
determine not only the present position of the
target but also the rates used in calculating
how much the target will move during the time
it takes the projectile to reach it. An illustra-
tion is furnished by antiaircraft fire against
• Bell Telephone Laboratories.
to remove the effects of small errors is cor-
respondingly more elusive.
The intimate relation between data smooth-
ing and target mobility explains why the data-
smoothing problem is relatively new in war-
fare. The problem emerged as a serious one
only recently, with the introduction of new and
highly mobile military devices. The airplane is,
of course, the archetype of such mobile instru-
ments, and we have already mentioned the
data-smoothing problem as it appears in anti-
aircraft fire. Since the relative velocity of air-
plane and ground is the same whether we sta-
tion ourselves on one or the other, however, the
71
72
DATA SMOOTHING AND PREDICTION IN FIRE-CONTROL
mobility of the airplane produces essentially
the same sort of problem in the design of bomb-
sights also. Another field exists in plane-to-
plane gunnery. Although they are somewhat
slower, the mobility of such vehicles as tanks
and torpedo boats is still considerable enough
to create a serious problem here also. Future
examples may be centered largely on robot
missiles. It is interesting to notice that a
guided missile may present a problem in data
smoothing either because it belongs to the
enemy, and is therefore something to shoot at,
or because it belongs to us, and requires
smoothing to correct errors in the data which
it uses for guidance. The tendency to higher
and higher speeds in all these devices must
evidently mean that fire control generally, and
data smoothing as one aspect of fire control,
must become more and more important, unless
war making can be ended.
Very mobile instruments of war, such as
the airplane, began to make their appearance
in World War I, but there was insufficient time
during that war to make much progress with
the fire-control problems which such instru-
mentalities imply. In the interval between
World War I and World War II, however, a
considerable number of fire-control devices,
such as bombsights and antiaircraft compu-
ters, were developed. The principal attention
in the design of these devices, however, was
on the kinematical aspects of the situation.
Although a number of them included fairly
successful methods of minimizing the effects of
observational errors,b it seems fair to say that
in the interval between the two wars there
was no general appreciation of the existence of
the data-smoothing problem as such.
It follows that the theory of data smoothing
advanced in this monograph is the result prin-
cipally of experience gained in World War II.
More specifically, it is the product of the ex-
* Most of these solutions depended upon the use of
special types of tracking systems. Examples are found
in the use of regenerative tracking in bombsights and
antiaircraft computers or in the determination of rates
from a precessing gyroscope or an aided laying mech-
anism in an antiaircraft tracking head. So far as their
effect on the data-smoothing characteristics of the
overall circuit is concerned, these devices are equiva-
lent to simple types of smoothing networks inserted
directly in the prediction system. This is discussed in
more detail under the heading "Exponential Smooth-
ing," Section 10.1.
perience of the authors with a series of proj-
ects, largely sponsored by Division 7 of NDRC,
concerned with the design of electrical antiair-
craft directors. In addition, it draws largely
on the results of a number of other investiga-
tions, also NDRC sponsored. The possible key
importance of data smoothing in the design of
fire-control systems was recognized by Division
7 early in the course of its activities and the
emphasis placed upon it in a number cf proj-
ects led to the accumulation of a much larger
body of results than nJght otherwise have been
obtained.
Data smoothing is developed here in terms
of concepts familiar in communication engi-
neering. This is a natural approach since data
smoothing is evidently a special case of the
transmission, manipulation, and utilization of
intelligence. The other principal, and perhaps
still more fundamental, approach to data
smoothing is to regard it as a problem in sta-
tistics. This is the line followed in the classic
work1 by Norbert Wiener/ For reasons which
are brought out later, Wiener's theory is not
used in the present monograph as a basis for
the actual design of data-smoothing networks.
Because of its fundamental iaterest, however,
a sketch of Wiener's theory is included. The
authors' apologies are due for any mutilation
to the theory caused by the attempt to simplify
it and compress it into a brief space.
The present monograph falls roughly into
two dissimilar halves. The first half, consist-
ing of the first three or four chapters, includes
a discussion of the general theoretical founda-
tions of the data-smoothing problem, the best
established ways of approaching the prob-
lem, the assumptions they involve, and the
authors' judgment concerning the assumptions
which best fit the tactical facts. In this part
may also be included the last chapter, which
contains a fragmentary discussion of alterna-
tive data-smoothing possibilities lying outside
the main theoretical framework of the mono-
graph.
The rest of the monograph is concerned with
the technique of designing specific data-smooth-
ing structures. A fairly elaborate and detailed
treatment is given here, in the belief that the
• Wiener is also responsible for providing tools which
permit the gap between the statistical and communica-
tion point* of view to be bridged.
CONFIDENTIAL
DATA SMOOTHING AND PREDICTION IN FIRE-CONTROL
73
problem of actually realizing a suitable data-
smoothing device is, in some ways at least,
as difficult as that of deciding what the general
properties of such a device should be. The
technique, as given, draws heavily upon the
highly developed resources of electric network
theory. For this reason the discussion is
couched entirely in electrical language, al-
though the authors realize, of course, that
equivalent nonelectrical solutions may exist.
For the benefit of readers who may not be
familiar with network theory, the monograph
includes an appendix summarizing the prin-
ciples most needed in the main text.
Two further remarks may be helpful in un-
derstanding the monograph. The first concerns
the relation between data smoothing and the
overall problem of prediction in a fire-control
circuit. These two are coupled together in the
title of the monograph, and it is clear that the
connection between them must be very close,
since, as we saw earlier, small irregularities in
input data are likely to be serious only as they
affect the extrapolation used to determine the
future position of a moving target. In the
statistical approach, in fact, data smoothing
and prediction are treated as a single problem
and a single device performs both operations.
In the attack which is treated at greatest
length in the monograph a certain distinction
between data smoothing and prediction can be
made. To simplify the exposition as much as
possible, the explicit discussion in the mono-
graph is directed principally at data smooth-
ing. This, however( is not intended to suggest
that there is any real cleavage between the
two problems or that the analysis as developed
in the monograph does not also bear, by impli-
cation, upon the prediction problem. Any the-
ory of data smoothing must rest ultimately
upon some hypothesis concerning the path of
the target, and the exact statement of the as-
sumptions to be made is in many ways the most
important as well as the most difficult part of
the problem. The same assumptions, however,
are also involved in the extrapolation to the
future position of the target. It is thus impos-
sible to solve the data-smoothing problem with-
out also implying what the general nature of
the prediction process will be. For example,
the formulation given in Chapter 9 amounts to
the assumption that the target path is specified
by a set of geometrical parameters correspond-
ing to components of velocity, acceleration, etc.
The data^smoothing process centers about the
problem of obtaining reliable values for these
parameters. To obtain a complete prediction
thereafter, it is merely necessary to multiply
the parameter values thus obtained by suitable
functions of time of flight and add the results
to the present position of the target.
The other general remark concerns the tacti-
cal criteria used in evaluating the performance
of a data-smoothing system. This turns out to
be one of the most important aspects of the
whole field. It is assumed here that the tactical
situation is similar to that of antiaircraft fire
against high-altitude bombers in World War
II. The defense can be regarded as successful if
only a fairly small fraction of the targets en-
gaged are destroyed. On the other hand, the
lethal radius of the antiaircraft shell is so small
that it is also quite difficult to score a kill.
Under these, circumstances we are interested
only in increasing the number of very well
aimed shots.
When we combine these assumptions with
the path assumptions described in Chapter 9
we are led to the data-smoothing solution for-
mulated here, in preference to the solution ob-
tained with the statistical approach. On the
other hand, we might equally well envisage a
situation in which the target contained an
atomic bomb or some other very destructive
agent, so that it becomes very important to
intercept it, while the lethal radius of the anti-
aircraft missile is correspondingly increased,
so that great accuracy is not needed for a kill.
In this situation our interest would be focused
on the problem of minimizing the probability
of making large misses, and the solution fur-
nished by the statistical approach would be ap-
proximately the best obtainable."1
" In fairness to the statistical solution it should be
pointed out that it is also the beat obtainable, without
regard to the lethal radius of the shell, if we replace
the path assumptions made in Chapter 9 by a "random
phase" assumption. The path assumptions in Chapter
9 are almost at the opposite pole from a random phase
assumption, and represent a deliberate overstatement,
made in order to illustrate the theoretical situation as
clearly as possible.
CONFIDENTIAL
Chapter 7
GENERAL FORMULATION OF THE DATA-SMOOTHING PROBLEM
ONE of the principal difficulties in any
treatment of data smoothing is that of
stating exactly what the problem is and what
criteria should be applied in judging when we
have a satisfactory solution. It is consequently
necessary to embark upon a rather extensive
general discussion of the data-smoothing prob-
lem before it is possible to consider specific
methods of designing data-smoothing struc-
tures. This preliminary survey will occupy
Chapters 7, 8, and 9. As a first step this chap-
ter will describe two of the general ways in
which the data-smoothing problem can be ap-
proached mathematically. The formulation of
the problem which is finally reached in Chap-
ter 9 is not the one which is most obviously
suggested by these approaches. This, however,
does not lessen their value in characterizing
the problem broadly.
7.1
A PHYSICAL ILLUSTRATION
In an actual fire-control system the data-
smoothing problem is usually made fairly spe-
cific because of the particular geometry
adopted in the computer. It may be helpful
to have some particular case in mind as a
touchstone in interpreting the general discus-
sion. For this purpose the most appropriate
example is furnished by long range land-based
antiaircraft fire, since most of the analysis
described in this monograph was developed
originally for its application to this problem.
It is usually assumed in the antiaircraft prob-
lem that the target flies in a straight line at
constant speed, and in one case at least the
computer operates by converting the input data
into Cartesian coordinates of target position
and differentiating these to find the rates of
travel in the several Cartesian directions.
These rates form the basis of the extrapolation.
The process is illustrated in Figure 1. The
input coordinates are transformed into elec-
trical voltages proportional to xP, y,., and zr,
the Cartesian coordinates of present position,
in the coordinate converter at the left of the
diagram. The extrapolation for * is shown
explicitly. It consists essentially in differen-
tiating to find the x component of target
velocity, multiplying the derivative by the time
of flight tf and adding the result to xP to find
15
(LEV
< AZIU
a*
COMDINA
CONVERTI
si
j 1
COOROI
CONVEI
FU2E
ELCV
»ZIU /
Figure 1. Dat
diction circuit.
xF, the predicted future value of x. A similar
procedure fixes yr and zr. After the addition
of certain ballistic corrections, these three co-
ordinates of future position are transformed
into gun aiming orders in the coordinate con-
verter shown at the right of the drawing. This
last unit also provides the time of flight re-
quired as a multiplier in the extrapolation.
The small irregularities in the input data
caused by tracking errors are greatly magni-
fied by the process of differentiation. It is thus
necessary to smooth the rates considerably if
a reliable extrapolation is to be secured. The
data-smoothing network for the x coordinate is
represented by JV, in Figure 1. Since the Car-
tesian velocity components are theoretically
constants if the assumption of a straight line
course at constant speed is correct, a data-
smoothing network in this computer must be
essentially an averaging device which gives
an appropriately weighted average of the fluc-
tuating instantaneous rate values fed to it. The
problem of "smoothing a constant" is given
special attention in Chapter 10. Aside from the
particular circuit of Figure 1, we may, of
course, be required to smooth a constant when-
ever the prediction is based upon an assumed
geometrical course involving one or more pa-
rameters which are isolated in the circuit.
CONFIDENTIAL
75
76
FORMULATION OF THE DATA-SMOOTHING PROBLEM
In addition to smoothing the rates we can,
if we like, attempt to smooth the irregularities
in present position also. A network to accom-
plish this purpose is indicated by the broken
line structure Na in Figure 1. Of course, in
dealing with the present position we are no
longer smoothing a constant, but suitable struc-
tures can be obtained by methods described
later. However, the effect of tracking errors in
the present position circuit is so much less than
it is in the rate circuit that N2 can generally
be omitted.
Geometrical assumptions of the sort implied
in Figure 1 are helpful in visualizing the prob-
lem, and they are of course of critical impor-
tance in determining what the final data-
smoothing device will be. It is important not
to make explicit assumptions of this kind too
early in the formal analysis, however, since
the meaning of such assumptions is one of the
aspects of the general problem which must be
investigated. For example, it is apparent that
no airplane in fact flies exactly a straight line,
nor flies a straight line for an indefinite period.
In detail, the solution of the data-smoothing
problem depends very largely on how we treat
these departures from the idealized straight
line path. For the present, consequently, it will
be assumed that the input data are presented
to the data-smoothing and predicting devices
in terms of some generalized coordinates, the
nature of which we wjll not inquire into too
closely. A given coordinate might, for example,
be a velocity, a radius of curvature, an angle of
dive or climb, or any other quantity which
would be directly useful in making a predic-
tion, or it might be a simple position coordi-
nate such as an azimuth or an altitude.
The data-smoothing and predicting opera-
tion itself is assumed to be performed by linear
invariable devices. Aside from the fact that
this assumption is, of course, a tremendously
simplifying one, it also fits the data-smoothing
problem very nicely, as the problem is formu-
lated in this chapter. With other formulations,
however, it appears that somewhat better re-
sults may be obtainable from variable devices
or devices including more or less radical
amounts of nonlinearity. These possibilities are
discussed briefly in Chapter 14.
72 DATA SMOOTHING AND
PREDICTION
Figure 1 illustrates a distinction between
two possible methods of looking at the data-
smoothing problem which it is advisable to
establish for future purposes. In describing the
x system in Figure 1 we laid emphasis on the
particular networks N, and Ns. It is clear, how-
ever, that the complete x circuit with input x,
and output xF is a network having overall
transmission properties which can be studied.
Since t, will normally vary with time, the net-
work is not, strictly speaking, an invariable
one, but the changes of t, are ordinarily too
slow to make this an essential consideration.
When it is necessary to make a distinction
between these points of view, a network such
as Nx, which is merely an element in the pre-
diction process, will be called a data-smoothing
structure. An overall circuit, providing data
smoothing and prediction in one step, will be
called a data-smoothing and prediction net-
work, or simply a prediction network. Al-
though these points of view have been illus-
trated for rectangular coordinates, they obvi-
ously apply also in many other situations. For
example, we might go so far as to apply the
overall point of view to a complete circuit from
input azimuth, say, to output azimuth.
Both points of view are taken from time to
time in the monograph. When possible, how-
ever, principal attention has been given to the
limited data-smoothing problem. This tends to
simplify the discussion, since the limited prob-
lem is evidently more concrete than the overall
prediction problem. Moreover, it permits us to
deal lightly with such questions as the particu-
lar choice of coordinates in which the smooth-
ing operations are conducted, since it assumes
that the general kinematical framework of pre-
diction has already been decided upon. On the
other hand, the overall point of view is more
effective in certain situations, and it is the only
natural one in the statistical treatment de-
scribed in the next section.
73 DATA SMOOTHING AS A PROBLEM
IN TIME SERIES
The most direct and perhaps the most gen-
eral approach to data smoothing consists in re-
CONFIDENTIAL
THE AUTOCORRELATION
77
garding it as a problem in time series. This
is the approach used by Wiener in his well-
known work.1 It essentially classifies data
smoothing and prediction as a branch of statis-
tics. The input data, in other words, are
thought of as constituting a series in time
similar to weather records, stock market prices,
production statistics, and the like. The well-
developed tools of statistics for the interpreta-
tion and extrapolation of such series are thus
made available for the data-smoothing and
prediction problem.
To formulate the problem in these terms,
let fit) represent the true value of one of the
coordinates of the target and let git) repre-
sent the observational error. Then fit) and
git) are both time series in the sense just
defined. The set of all such functions corre-
sponding to the various possible target courses
and tracking errors form an ensemble of time
series or a statistical population. One can im-
agine that a large number of particular func-
tions fit) and git) have been recorded, each
with a frequency proportional to its actual
frequency of occurrence. Wiener assumes that
they are stationary, that is, that the statistical
properties of the ensemble are independent of
the origin of time. This, of course, implies that
both functions exist from t = — co to i = + co .
We will sometimes find it more convenient to
make the assumption that the two functions
vanish after some fixed, but sufficiently remote,
points on the positive and negative real t axis.*
The input signal to the computer is of course
fit) + git). If we assume that the coordinate
in question represents a position, the quantity
we wish to obtain is fit + t,), where t, repre-
sents the prediction time. If the coordinate is
a rate, we are interested in an average value
of f(t) over the prediction interval. This com-
plicates the mathematics somewhat, but does
not essentially affect the situation.
» This is done for technical mathematical reasons. We
ahall later have occasion to consider the Fourier trans-
forms of f(t) and 0(f), and, to have well-defined trans-
forms, the integrals of the squares of the two func-
tions, from t - - co to t = + <o , should be finite. This
would not happen under the "stationary" assumption.
Wiener avoids the difficulty by introducing what he
calls a generalized harmonic analysis, but this method
is far too complicated to be treated in a brief sketch
like the present.
We shall not, of course, be able to predict
fit+tf) perfectly accurately. Let the pre-
dicted value be represented by f*it + t,). In
virtue of our assumption that the data-
smoothing and prediction circuit is to be a
linear invariable network, the relation between
f*{t •¥ t,) and the total input signal fit)
+git) can be written as
/*(< + </) = / \M + gi<r))dK(a) (1)
where dKia) represents the effect of the data-
smoothing and prediction circuit. Comparison
to equations (2) and (5) of Appendix A shows
that K is, in fact, the indicial admittance of
this circuit. The particular problem to be
solved is of course that of finding a shape for
the function Ki<r) which will make + t,)
the best possible estimate of fit + *f).
The fact that the upper limit of integration
in equation (1) is taken as a = 0 is particu-
larly to be noted. It corresponds to the fact that
in making a prediction we are entitled to use
only the input data which has accumulated up
to the prediction instant. This restriction will
be conspicuous in the next chapter, where the
time-series analysis is completed.
7 * THE AUTOCORRELATION
The principal statistical tool used in study-
ing equation (1) is the so-called autocorrela-
tion. Under the "stationary" assumption the
autocorrelation for fit) is defined by
*i(t) = g$*hf-T w*«w>*. (2)
We can obtain a normalized autocorrelation,
which is more convenient for some purposes,
by dividing by </>,(<>)• This gives
C f(l+r)fit)dt
, , \ <t>\ir) .. J-t
*"(t) = *m - Ay. ~r • «
J T 1/(0 J' dt
If we assume that fit) in fact vanishes for
sufficiently large positive or negative values of
t, the limit sign can be disregarded and e>lAr(T)
becomes simply
CONFIDENTIAL
78
0,v(r) - ffrj fit +T)f(t)dt (4)
( / (ty^dt and represents the total
"energy" in the time series f(t).
Precisely similar expressions can be set up
for the autocorrelation <f>2ir) or <j>2K(r) of the
observational error function git). In a gen-
eral case we might also have to worry about
a possible cross correlation between fit) and
g(t). This would be represented by a cross-
correlation function <£12(t), obtained by inte-
grating the product f(t + r)g(t). In practical
fire control, however, it can be assumed that
the correlation between target course and
tracking errors is small enough to be neglected.
As a simple example of the calculation of
an autocorrelation we may assume that f(t) =
sin wt. Then
1 CT
tf>i (t) = lim ;r=, I sin u(t + t) sin wt • dt
= lim 2? / ~ [cos wt — cos (2wt + wr)]d
- \ cos «*, (5)
since the term cos (2a>t + an-) will contribute
nothing in the limit.
The maximum value of (r) in (5) is found
at t = 0. This is to be expected, since ob-
viously the correlation between identical val-
ues of the function is the best possible. What
is exceptional about the present result is the
fact that <£,(t) is not small for all large t's.
This is fundamentally a consequence of the
fact that we chose an analytic expression for
fit), so that the relation between two values
of the function is completely determinate, no
matter how great the difference between their
arguments. In a more representative time
series, involving a certain amount of statisti-
cal uncertainty, we would expect £,(r) to ap-
proach zero as t increases, reflecting the in-
creasing importance of statistical dispersion as
the time interval becomes greater.
The significance of the autocorrelation func-
tion for data smoothing and prediction is ob-
vious without much study. Thus, suppose for
simplicity that the observational error #(0
is zero. Then the autocorrelation <f>, (t) is the
only one involved. It is a measure of the ex-
tent to which the true target path "hangs to-
gether" and is thus predictable. For example,
in weather forecasting it is a well-known prin-
ciple that in the absence of any other infor-
mation it is a reasonably good bet that tomor-
row's weather will be like today's but that the
reliability of such a prediction diminishes rap-
idly if we attempt to go beyond two or three
days. This would correspond to an autocorrela-
tion function which is fairly large in the neigh-
borhood of t = 0, but diminishes rapidly to zero
thereafter.
In a similar way the autocorrelation of the
observational error git) represents the extent
to which this error hangs together. In this
case, however, a high correlation is exactly
what we do not want. Thus, if <£2(t) vanishes
rapidly as r increases from zero, closely neigh-
boring values of g are quite uncorrelated, and
we need only average the input data over a
short interval in the immediate past in order
to have most of the observational errors aver-
aged out. If 4>2ir) is substantial for a much
longer range, on the other hand, a much longer
averaging period is necessary, with corre-
spondingly greater uncertainties in the value
obtained for fit).
«■ THE LEAST SQUARES ASSUMPTION
The autocorrelation function does not in it-
self suffice, to determine a time series com-
pletely. For example, it is easily seen that the
functions sin t + sin 2t and sin t + cos 2t have
the same autocorrelation in spite of the fact
that they represent waves of quite different
shape. The autocorrelation function, however,
has a peculiar importance in the fact that
under many circumstances it is the only piece
of information about the time series which we
need to know.
The significance of the autocorrelation be-
comes apparent as soon as we investigate the
error in prediction. In many mathematical sit-
uations involving linear systems it is conven-
ient to deal with the square of the error rather
than with the error itself, since a first varia-
tion in the error squared expression gives a
CONFIDENTIAL
^DATA SMOOTHING AS A _F_1LTER PROBLEM__
linear relationship in the quantities of direct
interest. We will deal with the square of the
error here. If E represents the instantaneous
error, /* (t + t,) - fit + t,) , the mean square
error over a long period of time is evidently
lim
L f*
= iim — r
\r(t + t,) -f(t + t,)}*dt
[f(t + tf)]*dt
- lim ^ f f(t + t,)f*(t + t/)dt
T -»» TJ_T
+ lim JL I'* ir(t + t,)\2dt. (6)
The first integral in equation (6) can be
evaluated immediately. From (2) it is <M0).
To evaluate the second integral replace f*(t
+ tf) by its definition from (1). This gives
-lim lfTf{t + t,)dt ["[fit - r)
+ g(t - T)]dK(r) = - lim ]- f dK{r)
(T lf(t + t/)f(t-r)+f{t + t/)g(t-r)}dt
J-T
if we reverse the order of integration. Since
we assume that / and g are uncorrelated, how-
ever, the product f (t + tf)g\t - r) in this ex-
pression makes no contribution to the final re-
sult, and by replacing the integral of f(t + t,)
f(t — t) by its value in terms of 4>l the expres-
sion as a whole can be written as
■
-if <t>i(tf +t) dK(T).
The third integral in (6) can be simplified in
similar fashion. The final result becomes
& - 4>i (P) - 2
f *i
Jo
(tf + r) dK(r)
(7)
+J\k{t) £ [0i(r - c) + Mr ~ <r))dK(c) .
The only quantities appearing in equation
(7) are the autocorrelations, <£, and 4>2, of the
true target path and the observational error,
and the function K which specifies the data-
unoothing structure. The theoretical problem
with which we are confronted is evidently that
of choosing K to make the mean square error
as small as possible for any given $'s. This
problem will not be attacked here, although a
solution obtained by a somewhat indirect
method is presented in the next chapter. The
principal reason for deriving equation (7) is
to demonstrate the very important fact that
the mean square error depends only upon the
two autocorrelations. No other characteristics
of the input data need be considered.
It will be recalled that the mean square cri-
terion was introduced originally on the ground
of mathematical convenience. This leaves un-
settled the question of how good a measure of
performance for a data-smoothi; g network it
actually is. This is a critical question, since
upon it depends the validity of the whole ap-
proach outlined in this chapter. A priori, the
least squares criterion is a dubious one since
it gives principal weight to large errors. In
fire control we are normally interested only in
shots which are close enough to register as hits.
If a shot misses it makes little difference
whether the miss is large or small. The merits
of the least squares criterion are considered
in more detail in Chapter 9, where the conclu-
sion is reached that the criterion is probably
adequate for many problems but needs to be
supplemented or replaced in others, including
the special case of heavy antiaircraft fire to
which particular attention is given in this
monograph. Pending the discussion in Chapter
9, the least squares criterion will be assumed
to be a valid one, with the understanding that
the analysis is intended primarily for its value
in contributing to the general understanding of
the data-smoothing problem rather than as a
means of fixing the exact proportions of an op-
timal smoothing network.
DATA SMOOTHING AS A FILTER
PROBLEM
The time-series approach to data smoothing
is closely associated with another which at first
sight may seem quite different. This second
approach is suggested by the procedures used
in communication engineering. Here the sig-
nals, be they voice, music, television, or what
not, are again time series. Instead of dealing
CONFIDENTIAL
80
with actual signals varying in a more or less
irregular and random manner with time, how-
ever, it is customary to deal with their equiva-
lent steady-state components on the frequency
spectrum.6
The analysis of data smoothing can conven-
iently be approached by supposing that both
the true path of the target and the effects of
tracking errors are represented, in a similar
way, by their frequency spectra. When the
situation is presented in this way, however,
there is an obvious analogy between the prob-
lem of smoothing the data to eliminate or re-
duce the effect of tracking errors and the prob-
lem of separating a signal from interfering
noise in communication systems. We may take
as an example of the latter the transmission
of voice or music by ordinary radio over fairly
long distances, so that the effects of static in-
terference are appreciable. In such a system
a reasonable separation of the desired signal
from the static can be obtained by means of
a filter. In a representative situation an ap-
propriate filter might transmit frequencies up
to perhaps 2,000 or 3,000 cycles per second,'
while rejecting higher frequencies.
The choice of any specific cutoff, such as
2,000 or 3,000 c, in the radio system depends
upon a compromise between conflicting consid-
erations. Both speech or music and static nor-
mally include components of all frequencies
which can be heard by the human ear. Thus,
suppressing any frequency range below the
limits of audibility, at perhaps 10,000 or 20,000
c, will injure the signal to some extent. The
intensity of the signal components, however,
diminishes rapidly above 2,000 or 3,000 c, while
the energy of the static interference is more
evenly distributed over the spectrum. Thus, by
filtering out the first 2,000 or 3,000 c, we can
retain most of the signal while rejecting most
of the noise. Naturally, the exact dividing line
will depend upon the relative levels of signal
and noise power. If the static interference is
quite weak, for example, it would be worth
b The review of communication theory given in Ap-
pendix A shows how this equivalence is established by
Fourier or Laplace transform methods.
0 In practice, of course, the filtering would probably
take place in the radio-frequency circuits, but it is
more convenient here to think of it occurring in the
demodulated output.
while to transmit a considerably wider band
in order to retain a more nearly perfect signal.
If the static level is extremely high, on the
other hand, it would be necessary to transmit a
still narrower band at the cost of greater mu-
tilation of the signal.
The separation of the true path of a target
from the observed path including tracking
errors, as a preliminary to prediction of the
future position of the target, presents an ap-
proximately analogous situation. Again the
spectrum of the "signal" or true path is con-
centrated principally in a low-frequency band,
in most instances, while the energy of tracking
errors or "noise" appears principally at con-
siderably higher frequencies. Thus the two can
be separated by a low-pass filter. The separa-
tion, however, is not complete since some com-
ponents of the signal spectrum extend into the
noise region. Thus the smoothing process must
be accompanied by some mutilation of the sig-
nal, and the optimum compromise is again
attained from a filter which transmits a rela-
tively broad band when the tracking errors are
of low intensity and a much narrower band
when they are large.
In these terms the most obvious difference
between the data-smoothing problem and the
static interference problem in the radio system
is in the order of magnitude of the frequencies
involved. They are roughly 10,000 times smaller
in the data-smoothing case. Thus, the typical
signal band in a fire-control system may cover
a few tenths of a cycle per second, in compari-
son with a useful band of 2,000 or 3,000 c in a
radio system, and the spectrum of tracking
errors or noise, with representative tracking
devices, includes appreciable components up to
perhaps 2 or 3 c, in comparison with a total
effective noise band in the radio system ex-
tending to the limits of audibility at perhaps
20,000 c.
This analogy between data smoothing and
the filtering problems which appear in ordi-
nary communication systems transmitting
speech or music must of course not be carried
too far. For example, previous experience with
communication filters is of no help in fixing in
detail the cutoff in attenuation characteristic
of the data-smoothing filter, since in communi-
cation systems these choices depend on psycho-
CONFIDENTIAL
PHYSICAL AND TACTICAL CONSIDERATIONS
81
logical considerations of no relevance in the fire-
control problem. Methods of determining the
best rules for proportioning a data-smoothing
filter, therefore, remain to be determined. We
may also notice that, whereas the time-series
approach was of the data-smoothing and pre-
diction type, the filter approach emphasizes
data smoothing only. The addition of the pre-
diction function can be expected to change ma-
terially the overall characteristics of the cir-
cuit. Neither of these remarks, however, robs
the filter approach of its value as a simple way
of thinking about the problem qualitatively.
RELATION BETWEEN TIME-SERIES
AND FILTER APPROACHES
7.7
The time-series and filter methods of looking
at data smoothing are related to one another
by the fact that the autocorrelation can be com-
puted from the amplitude spectrum, or vice
versa, by Fourier transform means. Consider,
for example, the Fourier transform of the
autocorrelation. If we make use in particular
of (4) we have
0..v (r)e ~*
V2irJ_a
i- f
""dr
jC
f(t + r)f(l)dt
1
V2t wt X
V2
/.CO
f{t)dt / f(l +t) e-^-dr
•J — CD
/(/ + T)e-*"»+*J rfr
(8)
where
1 fm
*'(«) = me-»*dt
y/2
L. f
'2r X.
f(t + t) e- •«('+') dr
(9)
F(w) is of course the steady-state spectrum
of the signal f(t). Equation (8) thus states
that the Fourier transform of <f>.s- is equal to a
constant times the square of the amplitude of
the steady-state spectrum. The amplitude
squared spectrum is, however, a measure of
the power per cycle. The relation is therefore
equivalent to the statement that the autocorre-
lation and power spectrum are Fourier trans-
forms of each other.
Since we have already established the fact
that the mean square error in prediction de-
pends only on the autocorrelation, this analysis
enables us to conclude immediately that the
mean square error can also be calculated from
the power spectra of the signal and noise. It
is entirely independent of the phase relations
in either signal or noise. The phase character-
istics of the data-smoothing network, which
operates on the signal after a specific wave
shape has been established, is, of course, still
of consequence.
PHYSICAL AND TACTICAL
CONSIDERATIONS
Thus far the material which has been pre-
sented has been primarily mathematical. It
has consisted, in other words, of outlines of
general analytical methods which are available
for use with the data-smoothing problem. It is
also possible to approach the problem in a
much more concrete fashion. It is obvious that
by giving thought to the details of the physical
characteristics of tracking units and targets,
and to the tactical situations with which we
expect to deal, it should be possible to draw a
number of specific conclusions about the prob-
lem as a whole. In a general theory of the de-
sign and tactical use of fire-control apparatus
such an approach might well be a primary one.
It is scarcely possible to follow it in detail in
the present discussion. The following para-
graphs, however, indicate some of the kinds of
considerations which can be brought into the
problem in this way. It will be seen that they
tend to modify the strictly mathematical ap-
proach, partly by qualifying to some extent the
assumptions made in the mathematics, and
partly by tending to give much more emphasis
to particular aspects of the problem than would
appear in a general analytic outline.
Choice of ouukuiinatbb
One of the most obvious omissions in the
general analysis thus far is any consideration
of the choice of coordinates in which the data
CONFIDENTIAL
82
FORMULATION OF THE DATA-SMOOTHING PROBLEM
smoothing is to take place. So far as either
the statistical or filter theory is concerned, the
coordinates in the data smoother may repre-
sent either the original tracking data or any
transformation of them. The fact that there is
actually something to be decided here, however,
is easily seen from the long-range antiaircraft
problem. The input tracking coordinates for
antiaircraft would normally be azimuth, eleva-
tion, and slant range. If the airplane flies in a
straight line roughly overhead, the general
shape of the azimuth and the azimuth rate as
functions of time are given by the curves in
Figure 2. The curves become indefinitely
3200
2400
1600
800
ACMILS)
A(MIL5/SEC)
V
tSECS
600
400
200
10
Figure 2. Azimuth and azimuth rate for crossing
target.
steeper as the target path approaches the
zenith, and it will be seen that if the approach
is reasonably close, either the azimuth or the
azimuth rate must include a very substantial
amount of high-frequency energy. Since the
possibility of an effective separation between
the signal and noise in the filter approach de-
pends upon the assumption that the signal com-
ponents are of quite low frequency with respect
to the noise, the presence of this high-frequency
energy is evidently serious.
When the target describes a violently evasive
path the signal spectrum must naturally in-
clude substantial high-frequency components,
whatever the coordinate system may be. The
high-frequency components indicated in Figure
2, however, are due to the fact that the target
path happens to pass almost over the director
and are essentially superimposed upon the
high-frequency components which reflect the
complexity of the target path itself. It is clear
as a matter of principle that an acceptable
coordinate system for data smoothing should
not introduce frequency components which de-
pend upon such accidental factors as the loca-
tion and orientation of the coordinate system.
The rectangular system mentioned in connec-
tion with Figure 1 evidently meets this condi-
tion; so also does the "intrinsic" system de-
scribed in the next section.
Physical Limitations of Target or Tracker
We may also approach the data-smoothing
question by a consideration of the motions
which are physically possible either in the
target or in the tracking device. In the heavy
antiaircraft problem, for example, there are
substantial physical limitations on the per-
formance possibilities of present-day aircraft
We can be quite sure that any motion incom-
patible with these limitations is necessarily a
tracking error and can be removed from the
incoming data. Naturally, these limitations
must appear in the power spectrum of the sig-
nal if they affect the mean square error in pre-
diction, so that their existence in no way dis-
putes the mathematical framework we have
set up. Consideration of the physical factors
which produce them, however, may permit
them to be established more easily or in more
clear-cut fashion than would be possible from
a statistical examination of target records
alone.
The limitations on airplane performance
can be stated most simply when the motion of
the airplane is expressed in so-called intrinsic
coordinates. These are the speed of the air-
plane, its heading, and its angle of dive or
climb. The maneuvering possibilities of a con-
ventional airplane in these three directions are
quite unequal. By banking sharply it can
maneuver violently to the right and left and
thus make quick changes in heading. The pos-
sibilities of maneuvering up and down, how-
ever, are considerably less, particularly for a
heavy airplane, where there are usually restric-
tions on the maximum angle of dive or climb
which can be assumed. The possibilities of
quickly changing the speed of the airplane,
finally, are almost nil. The thrust of an air-
plane propeller is so small in comparison with
CONFIDENTIAL
83
the mass of the airplane that only small accel-
erations are possible.*1
Thus the optimum filters for the three coor-
dinates should be different. The one for speed
can have a very narrow band, since most of
the signal energy for this coordinate occurs at
very low frequencies. The optimum band for
the angle of dive or climb, however, should be
larger (unless it turns out that pilots seldom
make use of maneuvering possibilities in this
direction) and the one for the heading larger
still. In this ability to discriminate among the
various possible directions of motion the in-
trinsic coordinate system is evidently an im-
provement even on the rectangular system.
Settling Time
Another aspect of the data-smoothing prob-
lem which has not been given conspicuous at-
tention in the purely mathematical discussion
is the fact that in an actual tactical situation
questions of elapsed time are of great impor-
tance^ Engagements usually begin suddenly
and last for a comparatively brief period, and
it is important to find a data-smoothing scheme
which provides adequate firing data as quickly
as possible after an engagement starts. A situ-
ation essentially similar to the beginning of an
engagement may also be presented whenever
the target makes a sudden change of course or
whenever it is necessary to shift from one
target to another in a given attacking body.
The time required for a computer to give
usable output data after any of these events is
its so-called "settling time," and is one of the
most important parameters of any data-
smoothing system. It is possible to make rough
estimates of settling time by indirect means in
both the statistical and filter theories of data
smoothing, but no explicit consideration of
necessary time lapses appears in either theory.
Evidently, the fundamental fault lies with the
"stationary" assumption.
* This ignores the possibility of changing the speed
through gravitational forces. Since these possibilities
are linked to the angle of dive or climb, however, they
can be predicted. This has actually been done in one
experimental computer.
Effect of Human Factors
Aside from the conditions on target perform-
ance which arise from the physical character-
istics of the target itself, there are others
which are due to the fact that the target is
under the control of a human being with a
definite purpose. The language of the statistical
and filter methods is broad enough to cover
almost any situation. It tends to suggest, how-
ever, that the typical target paths with which
we deal are the relatively structureless conse-
quences of random physical forces. The inter-
vention of purposive human behavior, on the
other hand, tends to give paths which fall into
more or less definite patterns. A simple illus-
tration is furnished by the argument which is
frequently offered in defense of the straight
line assumption in dealing with antiaircraft
defense against heavy bombers. It is contended
that while the targets may in fact engage in
substantial evasive maneuvers during most of
their flight, there will always be a substantial
period during the bombing run in which they
must fly very straight in order to achieve
bombing accuracy. On the basis of ordinary
probability we would of course expect substan-
tial straight line segments quite infrequently
if the course as a whole shows marked disper-
sion, and the intervention of the human pilot
thus provides a higher degree of structure than
one would expect in a corresponding situation
dominated by purely natural factors.
A broader example is furnished by a com-
parison of two airplanes, or perhaps more
simply of two boats, one of which is under the
control of a human operator, while in the other
the steering controls are lashed in a neutral
position. Both boats, say, may be expected to
experience small variations of course due to the
random effects of wind and waves upon them.
Over a short period of time the observed mo-
tions of the two boats should be substantially
identical. In the case of the boat with the
lashed helm these random variations will tend
to accumulate, so that it is possible to make a
reasonable prediction of the position of the
boat for only a comparatively short distance
in the future. In the boat with the human
steersman, on the other hand, we may expect
corrections to be applied as soon as the random
effects become large, so that the boat tends to
CONFIDENTIAL
84
FORMULATION OF THE DATA-SMOOTHING PROBLEM
retain the same general course and it is pos-
sible to predict its position hours or even days
later from a relatively brief observation.
Neither of these illustrations is inconsistent
with the mathematical framework laid down
phase relations, even if the special features in
these situations may be the controlling factors
in determining the actual probability of hit-
ting. If we could believe the bombing run
hypothesis, for example, and had a sufficiently
earlier in the chapter, in a purely theoretical accurate computer and gun, we could expect
sense. For example, the bombing run illustra-
tion merely states that because of the presence
of the human operator there are definite phase
relations in the input signal. As we have seen,
such relations can exist without affecting com-
putations based on mean square error. The
to score a hit in every engagement, no matter
how large the mean square error might be.
More generally, it is probably only the ten-
dency of targets to exhibit "line spectra" which
prevents the real probability of a kill, small
at best, from becoming microscopic. It is nec-
comparison between the piloted and pilotless essary to lay special emphasis on these factors
boats can be interpreted as the result primarily
of differences in the signal power spectrum.
In the case of the pilotless boat, for example,
the signal occupies a fairly continuous low-
frequency band, while in the case of the piloted
boat it must be regarded as concentrated very
closely around zero frequency, so that it is ap-
proximately a line spectrum superimposed on
a continuous one. The formal mathematical
theory covers also such cases as these.
The point of this discussion, however, is that
the mathematical theory, although it is suf-
ficiently general in a formal sense, fails to dif-
ferentiate between such situations as those
just described and the more shapeless sort which the mean square error is not a good
involving continuous spectra with random guide to the actual probability of scoring a hit.
in order to keep the overall fire control picture
in perspective.
CRITERION OF PERFORMANCE
Last on this list of doubts about the statisti-
cal and filter theories, we may mention the
least squares criterion of accuracy. This was
discussed before, but it is mentioned again as
a matter of emphasis, and because of its close
relation with the factors we have just dis-
cussed. For example, the bombing run illustra-
tion obviously represents one situation in
CONFIDENTIAL
Chapter 8
STEADY-STATE ANALYS
Tt was shown in the previous chapter that
J- both the statistical and filter theory ways of
looking at the data-smoothing problem lead
naturally to an analysis in terms of the power
spectra of the signal and noise. The phase rela-
tions are not important as long as we accept
the mean square error as a criterion of per-
formance. The inadequacies of the mean square
criterion will finally force us to abandon the
steady-state attack in favor of a direct analysis
in terms of the wave shapes of some assumed
signals. The steady-state attack is nevertheless
a very useful one. This chapter will conse-
quently continue the analysis from this point
of view. It will be assumed as heretofore that
the heavy antiaircraft problem is the particular
subject of interest.
A large part of the discussion hinges upon
the conditions which must be satisfied by the
external characteristics of an electrical net-
work if it is to be capable of physical realiza-
tion in any way whatever. These limitations
and the characteristics which may be postulated
for physical networks are decisive since, in the
absence of such restrictions, no limits could be
set upon the performance which might be ex-
pected from data-smoothing and predicting
circuits. The facts about physically realizable
networks which we shall find of most use are
summarized below, but the reader not familiar
with this field is urged to read also the account
given in Sections A.9 and A.10, Appendix A.»*
The conditions which must be satisfied by
physically realizable networks can be stated in
either transient or steady-state terms. In tran-
sient terms they are expressed most simply by
the statement that the response of a physical
network to an impulsive force must be zero up
to the time the force is applied. Thus the net-
work has no power to predict a purely arbi-
trary event. That is, it has no way of foresee-
ing whether or not an impulse is actually going
to be applied to it. This characteristic of physi-
cal networks is taken as a postulate.
The steady-state limitations on physical net-
S OF DATA SMOOTHING
works are expressed in terms of their attenua-
tion and phase characteristics. They may be
derived either from the transient specification
or from the postulate that a physical network
must be stable. There are no important limita-
tions to be placed upon the attenuation and
phase characteristics of physical networks as
long as we deal with these characteristics "sepa-
rately, but there are very severe limitations on
the phase characteristic which can be associated
with any given attenuation characteristic or
vice versa. In particular, when the attenuation
characteristic is prescribed, there is a definite
formula for calculating the unique limiting
phase characteristic with which it may be asso-
ciated.1" This is the so-called "minimum phase"
characteristic because any other physical net-
work having the postulated attenuation char-
acteristic must have as great or greater phase
shift at every frequency. As we shall see later,
this greater phase characteristic would corre-
spond to longer lags in obtaining usable data,
so that the minimum phase characteristic is
the optimum for a data-smoothing network.
The minimum phase characteristic has the addi-
tional important property that not only does
it specify the transfer admittance of a physical
network, but the reciprocal of that transfer
admittance can also be realized by a physical
structure.'
In addition to this principal formula for the
relation between attenuation and phase there
are a number of subsidiary expressions for
special aspects of the problem. One in partic-
ular, relating the attenuation to the behavior
of the phase characteristic in the neighborhood
of zero frequency, is used extensively in this
chapter.
» In limiting cases, such as may be found when the
transfer admittance contains zeros or poles exactly on
the real frequency axis, the "physical structure" may
require such constituents as ideally nondissipative re-
actances, perfect amplifiers with unlimited gain, etc.
This, however, is of no consequence for the present
general discussion.
CONFIDENTIAL
85
86
STEADY-STATE ANA!
DATA SMOOTHING
" 1 THE SIGNAL SPECTRUM
It is natural to begin with a discussion of the
spectrum of a typical target path. Unfortu-
nately no data on the spectra of actual meas-
ured airplane paths exist, and the theoretical
assumptions which may be made about paths
of airplane targets are best discussed in the
next chapter. This section consequently will be
confined to rather general observations about
the problem. It will be convenient to assume
for definiteness that the quantities to be
smoothed are the velocity components in Car-
tesian coordinates.
The simplest point of departure is furnished
by the conventional assumption that the target
flies in a straight line at constant speed. If we
could construe this assumption literally, it
would mean that the velocity spectrum in rec-
tangular coordinates would reduce to a single
line at zero frequency. In practice, of course,
the spectrum is not so simple. Even in the
absence of deliberate maneuvering, the target
will fly a slightly curved path because of
"wander." Moreover, even if the target could
fly exactly straight, the single line spectrum
would apply only to a straight course in-
definitely continued. The spectrum becomes
more complicated if we consider the fact that
tracking must have begun at some finite time
in the past, or that the target may presumably
change occasionally from one straight line
course to another.
As a result of both these causes, the actual
signal spectrum must be regarded as occupying
a band bordering on zero frequency. The distri-
bution of energy in detail will, of course,
depend on particular circumstances. The band
has no very well defined upper limit, but in
most cases the great bulk, at least, of the
energy should be below, say, one-fourth or one-
fifth of a cycle per second. For example, the
natural periods of a heavy airplane, which one
would expect to be correlated with wander, are
below this limit." This limit is also sufficient to
include most of the energy resulting from
changes in course occurring as frequently as
every ten or twenty seconds.
In general, it is to be supposed that the sig-
nal spectrum varies as where n may be
1, 2, 3, depending on the frequency range. This
follows from general considerations of the
limitations of airplane performance. Thus, if
we suppose that the velocity changes discon-
tinuous^ from time to time, it follows from
general Fourier principles that the amplitude
must vary as This is presumably a fair
representation of the actual signal spectrum at
low frequencies. At moderate frequencies, how-
ever, we must take account of the fact that the
velocity can actually be changed rapidly but
not discontinuously, and we consequently
assume that the amplitude begins to vary as
ura. Finally, at frequencies of the order of per-
haps one cycle per second one must take ac-
count of the fact that the airplane must bank
in order to turn. Since it takes some time to roll
into the bank, even the acceleration in the lat-
eral direction cannot be discontinuous, and
consequently the amplitude must begin to vary
as c.r\ The application of such successive limit-
ing factors in constructing a complete spec-
trum is described in more detail in Section A.8
of Appendix A.
One other general condition of the same kind
can be mentioned. It can be shown" that the
integral from zero to infinity of log H/l + if",
where H is the power spectrum, is very impor-
tant in determining the properties of a time
series. More explicitly, the integral converges
if the series is essentially statistical, so that we
cannot foretell the future from the past with
absolute certainty. This of course is the case
with an actual signal spectrum in a fire-control
problem. It implies two consequences; first,
that H cannot be zero over any finite band ; and
second, that in the neighborhood of infinite fre-
quency H diminishes slowly enough so that
| log H\/o>->0.
•« THE NOISE SPECTRUM
The spectrum of tracking errors depends
largely upon the particular sort of tracking
equipment involved. Broadly speaking, optical
tracking equipment (at least that of the present
or recent past) tends to produce tracking errors
not only of small amplitude, but also of low
frequency, so that they are hard to separate
from the signal spectrum. Radar equipment, of
the present time, produces higher-frequency
errors. Relatively high-frequency errors are
particularly likely to be found in very stiff
automatic tracking radars.
CONFIDENTIAL
RANDOM NOISE FUNCTIONS _
87
A number of examples of spectra of tracking
errors are shown in Figures 1, 2, and 3. The
spectra are given directly in terms of range
and angle errors. To make them comparable
with the velocity spectra described previously
POWER SPECTRUM
RANGE ERRORS
RMS =30 YDS
MEDIAN = 0.022CPS
6.10*-
5.10*
a.
E 4.10*-
t 4 6 « 10
FREQUENCY IN UNITS OF
Figure 1.
, 12 14 IS
90
Power spectrum of range errors of ex-
r.
it would be necessary to multiply all amplitudes
by io. In addition, it would of course also be
necessary to multiply the angle rates by some
suitable range in order to compare them di-
rectly with the yards-per-second rates we have
otherwise considered.
After multiplication by <■>, the radar spectra
appear to be about flat up to perhaps one cycle.
Beyond that point they no doubt drop off
slowly, although the accuracy of the data is not
sufficient to permit the situation to be stated
very exactly.
8.3
RANDOM NOISE FUNCTIONS
The properties of the signal and noise as we
assume them here can be conveniently
expressed by reference to the theory of so-called
"random noise" functions.h A random noise can
be defined as a function which has a definite
amplitude spectrum but completely random
phase characteristics. The theory of such func-
tions is well developed because of their frequent
POWER SPECTRUM
ANGULAR HEIGHT ERRORS
RMS= 1.0 MIL
MEDIAN =0.53 CPS
t 10
A 6 8 10 12
FREQUENCY IN UNITS OF^CPS
Figure 2. Power spectrum
errors of experimental radar.
of angular height
occurrence in physics. It is probable that
neither our noise functions nor our signal func-
tions are, strictly speaking, random noise ac-
cording to this definition. Thus, there are proba-
bly certain definite phase relations in our noise
functions because of the physical character-
istics of tracking devices. There is no evidence,
however, that any such relations are important
enough to be significant in the data-smoothing
problem, so that we are fully justified in iden-
tifying them with random noise functions as
defined above. The phase relations in the signal
are by no means random. As long as we con-
sider only the mean square error, however, this
factor is immaterial, and we can replace the
actual signal by a random noise function with
the same power spectrum for purposes of
analysis.
The most familiar example of a random
noise function is furnished by the thermal
"The fact that we also refer to tracking errors as
"noise" is, of course, merely a coincidence.
CONFIDENTIAL
88
voltage across a resistance R. This is a random
noise whose spectrum is constant up to very
high frequencies with the value P == 4\kTR (k
is Boltzmann's constant and T the absolute
temperature) . A second example is black body
POWER SPECTRUM
TRAVERSE ERRORS
RMS = 1.4 MIL
MEDIAN =0.31 CPS
CO 10
i
EL
U
cr
UJ
1
CO
— J
2
■» -
•OWER II
/
/ ^
0 2 4 6 1
1 10 12 14 16
FREQUENCY IN UNITS OF jtCPS
45
Power spectrum of trav
radiation. If there is black body radiation in a
space, the electric (or magnetic) field intensity
at a point is a random noise function with
spectrum
P(D =
8*/3 1
according to Planck's law. Random noise func-
tions also occur in the Schottky effect, in
Brownian motion, and in diffusion and heat
flow problems.
For purposes of analysis, a random noise
function can be thought of as a function made
up of a large number of sinusoidal components,
which are very closely spaced in frequency
and whose phases are completely ran-
dom.21 231 Thus a random noise can be repre-
sented as
.V
2] a- cos {unt + <(>n)
n - 1
where wn — n&f, A/ being the frequency differ-
ence between adjacent components. The phase
angles <f>„ are random variables which are in-
dependent with a uniform probability distribu-
tion from 0 to 2tt. As A/ decreases the functions
in this ensemble approach, in a certain sense,
a limiting ensemble, providing the amplitudes
a„ are adjusted properly. What is desired is to
have the total power in the neighborhood of
each frequency approach a certain limit P(/),
the power spectrum at that frequency. To do
this we make
a.i = 2tP(/)A/.
In the limiting ensemble the total power within
a small frequency range A/ is then P(/)A/.
The function PU) completely describes the
random noise ensemble from the statistical
point of view.
A particularly important special case is that
of a random noise with a constant power spec-
trum. This is often called "flat" or "white"
noise. True constancy out to infinite frequencies
is of course impossible since it would imply an
infinite total power in the function. The idea
is, however, still useful and can be approxi-
mated, as with resistance noise, by having a
spectrum which is constant out to such high
frequencies that behavior beyond this point is
of no importance to the problem. We may con-
veniently think of flat random noise as being
made up of a succession of weak impulses oc-
curring frequently but at random times with
respect to one another. This results from the
fact that a Fourier analysis of a single impulse
gives a flat spectrum, and the random occur-
rence of many of them produces a random set
of phases. In a physical problem, such as resis-
tance noise or Brownian motion, these im-
pulses might correspond to the effects of indi-
vidual small particles. Such a situation is of
course completely chaotic. If the impulses are
large and occur relatively infrequently, the
power spectrum is still flat, though the func-
tion is no longer a random noise function as
defined here. This conception, which corre-
sponds to a physical situation including definite
causative elements, will be revived later under
the name of the elementary pulse method of
analysis.
Random noise functions have a number of
interesting characteristics. For example, they
have the "ergodic property." This means that
CONFIDENTIAL
89
averaging a statistic along the length of a par-
ticular random function give' the same results
as averaging the same statistic over an
ensemble of functions having the t ime power
spectrum. Each function is typical of the
ensemble. To be more precise one must admit
exceptions, but the probability of an exception
is zero. For example, if we determine the frac-
tion of time a given random function f(t) has
a value greater than some constant .4, it will
be equal to the fraction of all functions in the
ensemble which are greater than A at t — 0
(with probability 1 ) .
A second characteristic of random noise
functions is the fact that they frequently lead
to Gaussian or normal law distributions. For
example, the aronlit'-Hes of a random noise
function are di^tri^ <:._d about zero in accord-
ance with the nvr^ttal error law. Likewise, the
amplitudes for two points spaced a given dis-
tance apart form a two-dimensional normal
error law distribution when we consider all
possible positions of the first point. It is ap-
parent that if the signal and noise are actually
random functions the mean square error is as
good a criterion of performance as any other,
since it completely fixes the distribution in a
normal law case.
A final property of random noise functions
is the fact that if a random noise is passed
through a filter the output is still a random
noise. If the power spectrum of the noise is
P(w) and the transfer characteristic of the
filter is Y(iw), the output spectrum is
P(a>)\Y(iw) \\ In particular, if we take the
derivative of a random noise with spectrum
P(w) we obtain one with spectrum w2P(w).
This last property of random noise functions
suggests a method of representing them which
we shall find useful in the future. The method
is represented by Figure 4. It consists of a
FLAT
SHAPING
NOISE
SOURCE
FILTER
Figure 4. Circuit representation of random
functions.
source of flat noise followed by a shaping filter
to give the desired power spectrum. We can
easily assign to the filter the characteristics of
a physically realizable structure by making use
of the relations between attenuation and phase
mentioned earlier in the chapter. It is merely
necessary to convert the desired power spec-
trum into a specification of the attenuation
characteristic of the filter and then use the
loss-phase formula to compute the correspond-
ing phase shift. It will be assumed that this
procedure has been followed when we make use
of this circuit at a later point.
The method of representing random func-
tions thown by Figure 4 illustrates graphically
the basis of the prediction schemes described
thus far. The flat noise is of course absolutely
unpredictable. The history of the function up
to any given instant gives no indication of its
value even a microsecond later. The filter, how-
ever, forces the output current to have a cer-
tain structure on which a prediction may be
based. For example, if the filter will pass only
very low frequencies it is clear that the output
can change very little in a microsecond.
84 THEORETICAL PROPORTIONS FOR
A DATA-SMOOTHING FILTER
The signal and noise spectra furnish the raw
material from which a suitable data-smoothing
filter can be deduced. We have still to deter-
mine, however, the exact rule for choosing the
cutoff and attenuation characteristic of the
filter from these spectra. It is clear that previ-
ous experience with signal-to-noise problems
in systems transmitting voice- or music is no
help, since the filter proportions here depend
upon psychological considerations of no rele-
vance to the fire-control problem. For example,
the interfering effect of a small amount of
noise is much greater than one might expect
from energy considerations, especially in in-
tervals of low message level, and it is con-
sequently worth while to maintain a relatively
high level of attenuation in the noise band.
Conversely, the breadth of the band required
for the message depends as much on the ability
of the ear to reconstruct a complete signal
from an incomplete one as it does upon the
actual signal power spectrum.
In the data-smoothing case a suitable crite-
rion, dependent upon more physical considera-
tions, can be obtained by minimizing the rms
error at the filter output. This criterion is
CONFIDENTIAL
90
STEADY-STATE ANALYSIS OF DATA SMOOTHING
easily developed from the power spectrum ap-
proach, and in a sense it is, of course, the only
possible one as long as we follow the methods
developed thus far.
A very general theory for the minimization
of the rms error of the filter output has been
developed by Wiener.1 Since the power spec-
trum approach is not the one we shall eventu-
ally follow, however, it is not necessary to give
this analysis in detail. The nature of the rela-
tionships can be seen from an elementary corn-
in Figure 5 let OA be a unit
square error is a minimum if
0'
Figure 5. Vector relation between input and out-
put of data-smoothing network.
vector representing the signal component at
some particular frequency. Let the amplitude
ratio between the input and output of the data-
smoothing filter be x, and let it be assumed that
the system is phase distortionless. This can
always be accomplished, at the cost of lag, by
phase equalization. Then the actual signal
output can be . represented by OB, where
OB/OA = x. Let the ratio of noise power to
signal power at this frequency be k2. Then the
output noise can be represented by the vector
BC, at some arbitrary phase angle 6, where
BC/OA = kx.
The error in the output of the data-smooth-
ing filter is evidently represented by the vector
AC. We have
(Acy = (CM)?i(i - x - kxcosey + (kxsmey]
= {OA)* l (1 - is) - 2*i(l - x) cos 6 + k'x') .
Since 6 is random the cross-product term in-
volving cos 6 disappears on the average. (More
generally, it disappears as long as the noise and
signal are uncorrelated, whether or not their
relative phases are entirely random.) This
leaves the mean square error as
Wan - (OA)l [1 _ 2Z + (1 + *»)*»] . (1)
x —
1
1 + A-» PN + Ps
where PB and Ps are, respectively, the signal
and noise power at this frequency. Upon sub-
stituting this result in equation (1) and "re-
membering that (OAV = PB, we find that the
minimum mean square error is
PsPs (2)
min
Ps + Pi
Equation (2) evidently represents the sought-
for rule for the filter transmission character-
istic. It is illustrated in Figure 6, where PN
©
ce
z
21.
to
2
to
o
1
1
1
1
1
1
w 1
I
1 ^
I
■
I
1
1
i — -
FREQUENCY
02
Figure 6. Optimum transmission characteristic
for data smoothing assuming signals with random
noise characteristics.
Figure 7. Si
in Figure 6.
spectra assumed
and Pt have been chosen respectively as the
flat curve and the 1/w* curve in Figure 7. In
comparison with the characteristics of typi-
cal filters in communication systems it is quite
CONFIDENTIAL
91
rounded with a relatively slowly falling ampli-
tude characteristic. More important than the
detailed rule for the transmission character-
istic, however, is the conclusion that the shape
of the characteristic is not very critical. There
is very little loss in replacing the actual curve
in Figure 6, by any other similar character-
istic. For example, we might validate the
assumption of zero phase distortion by making
use of the curve which automatically gives a
linear phase shift.150
A more extreme illustration is furnished by
the infinitely selective filter characteristic, with
perfect transmission in the range in which the
signal power is greater than the noise power,
and zero transmission elsewhere, indicated by
the broken lines in Figure 6.
It follows from equation (1) that in the
neighborhood of the cutoff point <o0 the mean
square error for this filter is twice that of the
optimum structure. In most frequency ranges,
however, the penalty is far less than this. Since
even a two-to-one change in the mean square
error would produce no tremendous improve-
ment in the effectiveness of fire, it is clear that
the result to which we are led by this method
of attack is by no means critical.
LAGS IN DATA-SMOOTHING FILTERS
The analysis just concluded has been directed
at the amplitude characteristics of a data-
smoothing filter. By virtue of the relations be-
tween the amplitude and phase characteristics
of physical networks mentioned earlier in the
chapter, however, the analysis permits us to
»l
p
■u
/
IN »•
1
u a
V
f
•-
<
3
■■
<
Figure 8. Some filter attenuation characteristics.
give at least a partial description also of the
phase characteristics of the filters. This is an
important consideration because it bears upon
the question of time delays in data-smoothing
systems which was mentioned in Chapter 7.
The general nature of the relationship in
simple cases is illustrated by Figures 8 and 9.
to
10
30
01
«l
M
U
■
9
*
y
/j
— —
e SHirr in
1
y
y
M
£ /
uA*<
1
Figure 9. Corresponding minimum phase char-
acteristics.
Figure 8 shows a series of rising attenuation
characteristics equivalent to rather unselective
falling amplitude characteristics of the general
type shown by the principal curve in Figure 6.
Figure 9 shows the corresponding phase char-
acteristics computed on a minimum phase shift
basis. In Figure 8 the central attenuation char-
acteristic B has been so chosen that the corre-
sponding phase characteristic in Figure 9 is
exactly a straight line at low frequencies,
where the transmitted amplitudes are appreci-
able. Curves A and C in the two drawings show
slightly different cases, but it is clear from
the figures that the tendency of the phase
characteristics to approximate linearity is still
marked.
In communication engineering a phase char-
acteristic proportional to frequency is inter-
preted as indicating a delay in seconds equal to
the slope dB/dw of the phase characteristic.
This relation is illustrated most simply by an
ideal line. The ideal line has zero attenuation
combined with a phase shift which is propor-
tional to frequency and which at any given fre-
quency is also proportional to the length of the
line in question. If we apply any arbitrary
wave to the line it is propagated down the line
with a definite velocity and unchanged wave
form. The time required for the wave to reach
CONFIDENTIAL
92
any point on the line is equal to the slope of the
phase characteristic to that point.
In a structure like a filter, which has an at-
tenuation characteristic varying with fre-
quency, it is of course no longer possible to
transmit an arbitrarily impressed wave with-
out change in wave shape. Even if the applied
wave is merely a suddenly applied d-c voltage
or single frequency sinusoid, there is a tran-
sient period before the response approximates
its final value. In structures having a substan-
tially linear phase characteristic over any fre-
quency range in which they exhibit an appreci-
able amplitude response, however, this total
transient characteristic falls naturally into two
parts. The first is a waiting period equal to the
slope of the phase characteristic, during which
the response is very small, whereas the second
is a true transient period in which the response
is substantial but does not resemble the final
steady-state response. This is illustrated by
Figure 10 which shows the voltage at the fifth
L5
LO
05
t
10 15 20
<Jet
25
Figure 10. Voltage at fifth section of conventional
low-pass filter in response to unit d-c voltage.
section of a conventional low-pass filter in
response to a d-c voltage applied at zero time
at the input terminals.1" The end of the waiting
period, as deduced from the slope of the phase
characteristic, is indicated by the broken line.
Delays of the sort just illustrated must be
expected in a data-smoothing filter whenever
the nature of the signal is changed. This hap-
pens at the beginning of tracking, in changing
from one target to another, or even in follow-
ing a single target when the target makes an
abrupt change in course. Since usable data in
a fire-control system must be quite accurate,
the delay to be allowed for must include both
the initial waiting period and the subsequent
transient period until the transient ripples
have almost vanished. A considerable part of
the art of desi0 ung data-smoothing networks
consists in controlling the design so that these
final transient ripples decay relatively rapidly.
We are not yet ready to discuss this problem:
It will turn out, however, that the minimum
interval which can be assigned to the "true
transient" period is about equal to that which
must be allowed for the initial waiting period/
Thus the slope of th? phase characteristic can
be used as an index of the lags which must be
expected in data smoothing merely by doubling
the delay to which the slope would normally be
said to correspond.
When we use the phase slope as an index of
delay it becomes immediately apparent that
lags are the necessary consequence of smooth-
ing in physical circuits. This is easily seen by-
reference to the relations which must exist be-
tween attenuation and phase characteristics in
physical structures. An example is provided by
the formula15*1
(3)
where A is attenuation, .4,, is the attenuation
at zero frequency, and B is phase shift. In other
words, the delay (measured by the slope of the
phase characteristic at zero frequency) is pro-
portional to the integral of the attenuation on
an inverse frequency scale when the attenua-
tion at zero frequency is taken a&.the reference.
The equation thus states that the system will
exhibit a lagging response as long as there is a
net high-frequency attenuation. As a numerical
illustration, let it be supposed that A is zero
below 4» — 1. This corresponds to the estimate
made earlier in the chapter that the input sig-
nal components in antiaircraft work lie roughly
in the band below about 0.1 or 0.2 cycle per sec-
ond. Let it be supposed also that A at higher
frequencies is equal to 3 nepers, corresponding
to an average amplitude reduction of about 20
c This is not intended to imply that the distinction
between the initial waiting period and the "true tran-
sient" period is quite as sharp as it is in Figure 10. The
selectivity in a data-smoothing filter is usually not
great enough to justify the assumption that components
beyond the linear phase region are of negligible im-
portance.
CONFIDENTIAL
93
to 1. Then dB/d* at the origin is given from
equation (3) as S/n seconds, and in accordance
with the rule just enunciated the minimum de-
lay to be expected from such a structure in a
data-smoothing application would consequently
be 12/ir seconds.
Aside from such specific quantitative rela-
tions equation (3) is useful as a basis for a
number of important qualitative conclusions.
One, for example, is the fact that although a
lag is a necessary concomitant of any system
showing a high-frequency attenuation, the
amount of the lag depends greatly upon the
portion of the frequency spectrum in which
the attenuation is found. Since the integral is
taken on an inverse frequency scale, a small
attenuation at low frequencies is much more
important than a considerably greater attenua-
tion further out in the spectrum. This points to
the desirability of designing tracking instru-
ments which generate principally high-fre-
quency noise, even if the amplitude of the noise
is somewhat increased thereby. We may also
notice that since the attenuation is a logarith-
mic function of amplitude an initial moderate
reduction in the amplitude of disturbing noise
may be much less expensive in lag than subse-
quent attempts at further reduction. For ex-
ample, an amplitude reduction from 100 to 10
per cent over a given portion of the frequency
spectrum produces no more lag than a subse-
quent reduction from 10 to 1 per cent.
»« WIENER'S PREDICTION THEORY-
ZERO NOISE CASE
In Chapter 7 we distinguished between what
we called the simple data-smoothing problem
and the data-smoothing and prediction prob-
lem. The simple problem, with which this re-
port is chiefly concerned, is the one which has
been given principal attention thus far. On
account of its broad interest, however, it seems
worth while to include also a brief statement
of Wiener's solution of the general problem.
The method of development used here is intui-
tive and nonrigorous in comparison with
Wiener's own development, but it permits the
principal relations to be established by very
elementary means.
It is convenient to consider first the zero
noise case. The past history of the signal, then,
is known perfectly, and the existence of a
prediction problem depends entirely upon the
fact that since the signal is assumed to be sta-
tistical in character, its future is not com-
pletely determined from its past. The situation
can be thought of in the terms suggested by
Figure 11. The actual signal output appears at
FLAT
NOISE
SOURCE
SHAPING
NETWORK
N,
PREDICTING
NETWORK
N.
rl
NETWORK
Figure 11. Schematic representation of Wiener's
prediction theory when there is no noise.
P,. In accordance with the discussion earlier
in the chapter, we imagine this signal to be
generated by passing flat noise through the
shaping network Nx. The transfer admittance
Yx(iio) of Nt is determined from the power
spectrum of the signal by the procedure out-
lined earlier and is a minimum phase shift char-
acteristic. It will be recalled that minimum
phase shift transfer admittances have the im-
portant property that their reciprocals are also
the transfer admittances of physically realiz-
able networks.
From F, we can readily compute the tran-
sient response characteristic of N\. We shall
assume for illustrative purposes that the im-
pulsive admittance of Nl takes the special
shape shown by Figure 12.
Figure 12. Assumed impulsive admittance of
shaping filter.
The flat noise is thought of as consisting of
a large number of elementary impulses with
random amplitudes and occurring at random
times. For the purposes of this analysis, how-
ever, it is sufficient to consider only the three
unit impulses shown in Figure 13. Impulse B
is supposed to occur at the instant at which
94
STEADY-STATE ANALYSIS OF DATA SMOOTHING
the prediction is to be made, A occurs two sec-
onds in the past, and C, one second in the
future. The response of AT, to these three im-
pulses will evidently be three curves of the
sort given by Figure 12, suitably displaced in
time as shown by Figure 14.
B
1
-2 -I 0
Figure 13. Impulses giving rise to applied signal
through shaping filter.
The desired output of the predicting network
is the curve of Figure 14 advanced by the pre-
diction time, which we can assume, for illus-
tration, to be two seconds. It may be assumed
SUM \
I
t
#
1 ,
a • I
» "
$
$
9 1
"Hf \r
/\ '*
/ V *
\
\
%
\
t \
% \
* \
t
$
$
1
0
. *
I
*
/
t
<
V
-<
0 2 4 t
8
Figure 14. Applied signal at P„
for the sake of preliminary analysis that the
input of the predicting network is the three
original impulses of Figure 13. The terminal
Pt at which they are supi
appear is of
course a purely fictitious one and is not acces-
sible to us physically. We can, however, con-
struct the equivalent terminal P'3 by imposing
the actual signal from terminal Px on the net-
work N2, whose transfer admittance is the
reciprocal of that of
Let the predicting network connected to ter-
minal Fa be represented by N,. Obviously a
perfect prediction would be secured if Nt could
be assigned the impulsive admittance shown in
Figure 15, that is, an impulsive
/
/
2 (
» ;
> A
6 «
Figure 15. Iueal impulsive a
tion network N, in Figure 11.
equal to the impulsive admittance of the origi-
nal network but moved forward by the 2-second
prediction time. Then all the constituent curves
and the sum curve in Figure 14 would similarly
be moved forward. Of course we cannot assign
ATS an impulsive admittance which is different
from zero at negative times without postulat-
ing a nonphysical network. It is, however, per-
fectly possible to define N, from the portion of
the impulsive admittance characteristic at posi-
tive times, with the remainder set equal to
zero. This gives an impulsive admittance of
the type shown by Figure 16. When energized
by the three unitary impulses, it gives the
result shown in Figure 17. The contributions
of impulses A and B are not affected by the
absence of a negative time portion of the im-
pulsive admittance, but the contribution of im-
pulse C is lost.
To formulate a physical prediction network
2 0 <
\ A
Figure 16. Realizable portion of required im-
pulsive admittance.
CONFIDENTIAL
WIENER'S THEORY— CENERAL CASE
95
we have merely to find by conventional meth-
ods the steady-state admittance Y, corre-
sponding to the impulsive admittance of Figure
16. The two networks AT, and A7;1 may then be
in the manner shown by Figure 18. The first
source of flat noise, together with the shaping
network N,„ is the combination we have already
used to represent the signal in the noise-free
-2 0 2 4 6 8
Figure 17. Response of realizable prediction net-
work.
combined to give a single structure with the
transfer admittance Y,Y: = YJY, which will
give the complete prediction when energized by
the actual signal.
The mean square error in prediction is
easily determined from the fact that the con-
tributions of all impulses of the sort repre-
sented by C, occurring in the prediction in-
terval, are lost. Since impulses in the flat noise
source occur at random times the mean square
error is proportional
tojT
W-(T)dT, where a
is the prediction time and W is the impulsive
admittance of Figure 16. Since the flat noise
impulses occurring after the time at which the
prediction is made are surely unpredictable, it
is clear that this error is the least we could
expect any physical prediction network to have
WIENER'S THEORY-GENERAL CASE
When the input data includes noise as well as
the signal it is natural to think of the situation
FLAT
NOISE
SOURCE
SHAPING
NETWORK
N*
FLAT
NOISE
SOURCE
SHAPING
NETWORK
Figure 18. Circuit representation of random func-
tions representing signal and noise.
case. The addition of noise is represented by
the second independent source of flat noise with
its associated shaping network Nh. They com-
bine to give the total input measured at Pt.
This diagram emphasizes the fact that we
think of the noise and signal as originating
from different physical sources. By postulate,
however, we are not able to separate the
sources experimentally. So far as any observed
result is concerned, consequently, we may as
well deal with the simplified structure shown
in Figure 19 which contains a single source of
f LAT
SOUR"
SHAPING
NETWORK
IS
NETWORK
«4
— *
NETWORK
PREDICTING
NETWORK
"t
Figure 19. Schematic representation of Wiener's
prediction theory when there is noise.
flat noise and a single shaping network. The
transfer admittance of the shaping network N,
is determined by adding the power spectra of
signal and noise, converting the result to an
amplitude characteristic, and computing the
corresponding minimum phase according to
^methods already used for the noise-free
Although we cannot separate the signal from
d Note that the Bhaping network thu* obtained ia not
the same as the one we would secure by adding the
transfer admittances of N. and N, in Figure 18 di-
rectly. In order to realize the same total power at P,
in each case, it is necessary to begin by adding the
powers rather than the amplitude characteristics asso-
ciated with the two paths.
CONFIDENTIAL
96
STEADY-STATE ANALYSIS OF DATA SMOOTHING
the noise completely, we saw earlier that the
mean square difference between the total input
and the signal is minimized if we multiply the
amplitude of the input at each frequency by
the ratio of the signal power to the sum of the
signal and noise powers. A fictitious filter
having the prescribed amplitude characteristic
is represented by Nt in Figure 19. We assigned
2V4 a zero phase characteristic so that there
may be no lag in producing the result at P,.
Thus the output at Ps at any instant represents
the best conceivable estimate (in the least
squares sense) of the signal at that instant.
The assumption of zero phase, of course, makes
Ni nonphysical, since it must have at least the
minimum phase characteristic associated with
its prescribed amplitude characteristic. This,
however, is not an objection here since the
structure is introduced purely for purposes of
analysis.
The situation is now reduced to a form in
which it is substantially equivalent to the one
appearing in the zero-noise case. Wi assume a
series of random impulses at P., which would
produce responses at P,. The problem is that
of advancing the response to each impulse so
that the same result appears u seconds earlier
at terminal P4. The solution is represented by
networks 2V, and N3, which discharge functions
similar to those of the correspondingly labeled
networks in Figure 11. Thus, the network N2
is the reciprocal of N, and is provided to make
terminal P'2 equivalent to P„ as a source of im-
pulses. Network N3 is defined by an impulsive
admittance obtained from the impulsive admit-
tance between P, and P, by advancing the
latter characteristic a units in time and then
discarding the portion at negative time.
In this procedure there is only one point at
which the situation differs from that without
noise. In the noise-free case, the original im-
pulsive admittance which we wished to advance
in time was identically zero at negative times.
In order to secure a physically realizable re-
sult, we needed only to discard the portion of the
impulsive admittance between t = 0 and ( = a.
In the present situation, on the other hand, the
impulsive admittance is taken from a path in-
cluding the nonphysical network Nt. Thus the
admittance may be expected to take such form
as that shown in Figure 20, with nonzero am-
plitudes at both negative and positive times,
and in order to secure a physical final network
it is necessary to discard everything to the left
of the line a.
Figure 20. Typical impulsive admittance of best
smoothing network Ni in Figure 19.
This difference in the impulsive admittance
characteristics has two consequences. The first
is the fact that since the uncertainty of the
prediction is measured by the amount of im-
pulsive admittance which must be discarded,
it is evidently greater in the present case where
we are discarding much more. The second is
the fact that in the noise-free case uncertainty
exists only for a positive prediction time. A
negative prediction time, which corresponds, of
course, to the determination of the value as-
sumed by the signal at some time in the past,
can be set into the analysis as easily as a posi-
tive prediction time, merely by shifting the im-
pulsive admittance to the right rather than the
left. In the noise-free case, however, there is
nothing to be discarded when we shift to the
right, since the impulsive admittance with
which we begin is in any case identically zero
for negative times. Thus the uncertainty in
the determination of any past value of the sig-
nal is zero. Since we have postulated no noise
to confuse the data, this is, of course, an
inevitable result. As soon as noise is included,
on the other hand, there is no such sharp dis-
tinction between the future and the past.e The
uncertainty in the determination of the true
value of the signal in the near past is almost
as great as it is in estimating what the signal
will be in the near future. As we go further
* This statement is to be understood in a physical
rather than a mathematical sense. It is not intended
to imply that there may not be sharp changes of be-
havior in the impulsive admittance at zero.
CONFIDENTIAL
OVERALL CHARACTERISTICS OF PREDICTING NETWORKS
97
and further into the past the uncertainty
gradually diminishes. If we can allow ourselves
unlimited lag, we at length reach a point at
which the discarded portion of the impulsive
admittance characteristic is negligibly small.
This, however, does not mean that all uncer-
tainties have disappeared, but merely that we
can base our estimate of the signal upon the
power-ratio rule developed previously.
88 OVERALL CHARACTERISTICS OF
PREDICTING NETWORKS
It has been fairly easy to develop a qualita
tive picture of the general characteristics of
typical data-smoothing networks. As we have
seen, they have amplitude characteristics of the
low-pass filter type combined with lagging
phase shifts. No corresponding qualitative pic-
ture of the characteristics of a typical overall
predicting circuit has, however, been developed
as yet. The discussion just concluded provides
a rule for determining the characteristics of a
predicting circuit in any given case, but pro-
vides comparatively little in the nature of a
description of the result we may expect to
secure.
In any particular situation we can, of course,
calculate the overall characteristics of the pre-
dicting circuit. A simpler way of character-
izing the overall predictor characteristic quali-
tatively, however, is based upon the use of the
attenuation-phase relations for physical net-
works. We need merely use such an equation
as (3) backward. Thus, we have previously
shown that a positive phase slope corresponds
to a lagging output. Correspondingly, a nega-
tive phase slope can be interpreted to repre-
sent a lead, or in other words, a prediction.'
If we assign (dB/di>)u = 0 in equation (3) a
negative value, we see that A-A0 must on the
average be negative. In other words, the am-
plitude characteristic of an overall prediction
circuit must rise, on the average, as we proceed
upward from zero frequency. This is in marked
contrast to a data-smoothing network, which,
as we have seen, tends to have a low-pass filter
type of characteristic with a falling amplitude
characteristic at high frequencies. The in-
creased amplitude of response may have two
detrimental effects. In the first place, it evi-
dently produces a- distorting effect on any sig-
nal components to which it applies. In the
second place, it produces an exaggerated re-
sponse to noise.
Examples of the characteristics of overall
prediction circuits are readily constructed by
reference to the circuit of Figure 21. Various
Figure 21. One-dimensional prediction circuit
with data-smoothing networks.
' This, of course, does not mean that a network with
a negative phase slope can predict a perfectly arbitrary
event. We can hope to realize a negative phase slope,
in combination with a flat amplitude characteristic,
over only a finite band. The spectrum of an arbitrary
event, that is, any suddenly applied signal, will always
include important components running out to infinite
frequency, where the negative phase slope can no longer
be realized. The statement does, however, mean that if
we suddenly apply a signal made up of one or more
low-frequency sinusoids, and wait for the steady state
to become established, the output will appear to lead
the input by a time equal to the slope of the negative
phase characteristic.
particular results are obtained by assigning
particular characteristics to the data-smooth-
ing network. Thus, if the data-smoothing net-
work is absent entirely the transmission
through the path containing the differentiator
is u,tlt since differentiation is equivalent to
multiplication by i*>. The attenuation of the
overall circuit is consequently A = — log
|1 + imtf\. This is plotted as curve I of Figure
22. The increasing amplitude characteristic at
high frequencies is obviously due fundamen-
tally to the increased transmission through the
differentiator circuit.
If the data-smoothing network is assigned
the characteristic (1 + to**)-1, corresponding to
a very simple low-pass filter type of response,
the overall transmission becomes that shown
by curve II in Figure 22. (It is assumed that
a = t,, for simplicity.) The negative attenuation
at high frequencies is much reduced. This is
paid for by an increased amplitude of response
at low frequencies, but since the integration in
(3) takes place on an inverse frequency scale,
the low-frequency fragment is much less than
the gain reduction at high frequencies. Curve
CONFIDENTIAL
98
STEADY-STATE ANALYSIS OF DATA SMOOTHING
Ill shows the result whan the data-smoothing
network is assigned the characteristic
(1 + um) *. Finally, curve IV shows the result
obtainable when there is also a After in the
1
4
1
*
t
s
LOSS
-4
-»
Figure 22. Attenuation characteristics of predic-
tion circuit shown in Figure 21.
present-position circuit (as shown by the
broken lines in Figure 21), so that there may
be a net positive attenuation at high fre-
quencies.
In view of the inverse frequency scale in (3),
the gross negative attenuation will be mini-
mized if the negative attenuation region is
placed very close to zero frequency. This, how-
ever, means that much of the signal energy
falls in the negative attenuation region so that
in certain respects, at least, the signal response
must be seriously injured. For example, in the
specific circuits just discussed we can place the
negative attenuation region at very low fre-
quencies by choosing very long time constants,
a, in the data-smoothing networks, with the
consequence that the circuits will operate cor-
rectly for any long continued straight line path,
but will be very sluggish in changing from one
straight line to another. If the negative attenu-
ation region is placed at higher frequencies, on
the other hand, the signal response is improved
but beyond certain limits the circuit becomes
unbearably sensitive to noise.
Quantitative illustrations of these relation-
ships are quickly constructed. Suppose, for ex-
ample, that the prediction time is 2 seconds.
From (3) this is consistent with an attenua-
tion characteristic having zero attenuation
below - = 1 and a net gain of *■ nepers there-
after. In other words, the amplitudes of all
frequencies below « = 1 are increased by a fac-
tor of about 22 to 1. If the region of added
gain is pushed to a higher frequency or con-
centrated within a narrow band, the multi-
plying factor rapidly becomes larger. For ex-
ample, if we maintain A at approximately zero
below m = 2, the average gain above this point
must be 2» nepers, corresponding to a multi-
plying factor of 600 to 1. We secure the same
factor by attempting to concentrate the region
of negative attenuation in the band between
m = 1 and m = 2. The multiplying factor also
goes up rapidly as we increase the prediction
time. For example, with the gain uniformly
spread over the frequency region above «> = 1
the multiplying factor is 500 for a prediction
time of 4 seconds, or more than 10,000 for a
prediction time of 6 seconds.
Reasonable multiplying factors with long
prediction times can be obtained only by carry-
ing the negative attenuation region to very low
frequencies. As indicated previously, the cost
of this is an increase in the time required for
the signal to change from one constant or
nearly constant value to another. For exam-
ple, in the first illustration above, if the region
of nepers net gain is carried down from
o> = 1 to n = 0.2 the integral in (3) is just five
times as great as it was before, so that the
characteristic corresponds to a prediction time
of 10 rather than 2 seconds. This change
would correspond to an increase* from perhaps
4 or 5 to perhaps 20 or 25 seconds in the time
required for the circuit to settle from one con-
stant value to another.
Practical examples of the transmission char-
acteristics of overall prediction circuits, with
particular emphasis on the dominant effect of
even very small negative attenuations at ex-
tremely low frequencies, are shown later in
Figures 5 to 8, inclusive. In the linear predic-
tor, A - A„ varies as — ku>2 nears zero, and it is
easily seen that such a term makes a finite con-
« Only rough numbers can be given, since circuits
with the square-cornered attenuation characteristics
chosen for illustrative purposes would have very ripply
transient characteristics, corresponding to no very well
marked settling time.
CONFIDENTIAL
OVERALL CHARACTERISTICS OF PREDICTING NETWORKS
99
tribution to the integral in (3) . On the other
hand, the attenuation of the quadratic predic-
tor, which is capable of dealing exactly with
polynomial functions of time of the second
degree or less, is necessarily zero at the origin"
.
v2*£f JS£ of Quasi-Distortionleas Prediction
Networks in Appendix A.
to terms of the order of «4, so that the integral
in this region can be neglected. This slight
difference between the two characteristics at
frequencies of the order of 0.01 cycle per
second and below is sufficient to balance the
obviously greater negative attenuation of the
quadratic predictor at higher frequencies.
CONFIDENTIAL
Chapter 9
THE ASSUMPTION OF ANALYTIC ARCS
THE discussion in the previous two chap-
ters has been based upon the assumption
that the least squares criterion forms a suita-
ble measure of performance for a predicting
network. This assumption permitted us to re-
strict our attention to the amplitude spectra
of the signal and .noise, leaving phase relations
entirely out of account. Thus, both signal and
noise could be thought of as "random noise"
functions characterized by random phases and
Gaussian distributions, as described in the
preceding chapter. So far as the noise is con-
cerned, there seems to be nothing wrong with
this assumption. In the case of the signal, how-
ever, it appears that significant phase relations
may exist. This chapter will consequently set
up an alternative analysis which permits the
significance of possible phase relations in the
target paths to be estimated.
The alternative analysis is based upon the
assumption that the target courses are sequen-
ces of analytic segments of different lengths
joined together. These segments are simple
predictable curves such as straight lines, pa-
rabolas, and circles. Significant phase relations
are implied by the assumption that there are
sudden changes from one type of course to
another.
This picture of target paths is, of course,
extreme. There are no such sharp discontinui-
ties between one segment and another, nor do
airplanes fly perfectly along simple curves
even for limited periods. Nevertheless, it is
the conception of target courses upon which
the rest of our analysis is based. The reasons
for believing that it is a closer approximation
to actual target courses than, say, a random
noise function with the same power spectrum
would be, are given later. Perhaps more im-
portant is the fact that the possibility of hit-
ting an airplane flying along such a simple
analytic arc is much greater than it would be
if we were attempting to predict a correspond-
ing random noise function. It is thus advan-
tageous to take the analytic arc assumption as
a basis for designing the prediction circuit,
even if the assumption seems to be reasonably
well justified over only occasional segments of
actual target paths. An example of such a
situation is furnished by the bombing run
illustration described in Chapter 7.
As a corallary to the analytic arc assump-
tion it is also assumed that the theoretical
predicted point must be quite close to the actual
target position if the probability of scoring a
hit is to be appreciable. In other words, such
dispersive factors as random errors in com-
puter or gun or the lethal radius of the shell,
which would tend to produce occasional hits at
long distances from the theoretical predicted
point, are quite small. This is such a plausible
assumption in the light of present-day antiair-
craft experience that its critical importance in
the present argument is likely to go unper-
ceived. However, this is the assumption which
limits consideration to small errors in predic-
tion, whereas the least squares criterion natu-
rally gives greatest emphasis to large errors.
If, for example, antiaircraft projectiles were
suddenly endowed with a much greater de-
structive radius, we would be much more in-
terested in fairly large misses, and the objec-
tions to the least squares criterion would disap-
pear.
These postulates are discussed in more detail
in the following sections. In anticipation of
this discussion the following conclusions may
be mentioned:
1. With the assumptions as stated, the pre-
diction should be on a modal rather than a
least squares basis. In other words, the gun
should be aimed at the most probable future
position of the target.
2. Modal prediction requires evaluation of
the parameters of the analytic arc the target
is at present traversing. This can be accom-
plished by smoothing the values of these pa-
rameters evaluated for a period in the past.
3. If the smoothing is performed by linear
invariable networks, the impulsive admittances
of these networks should have a definite cutoff
after a finite smoothing time. By this means
100
CONFIDENTIAL
101
all data over a certain age are given zero weight.
The method of calculating the proper smooth-
ing time is developed.
4. Definite advantages can be obtained from
circuits with variable smoothing times if such
systems can be satisfactorily mechanized.
THE TARGET COURSES
The target courses, like the tracking errors,
can be thought of as a statistically generated
set of functions — that is, a stochastic process.
The structure of this process is, however, very
different from that of the tracking errors. It
is by no. means satisfactory to assume the
target courses to be equivalent to a random
noise having the same power spectrum as the
target courses. As we pointed out in Chapter
7, the target is piloted by a purposeful human
being. It tends to follow a definite simple curve
for a period of time and then to shift to a new
simple curve. Much of the flight is in attempted
straight lines with constant velocity. Most of
the remainder can be considered to be segments
of circles or helices in space, or as segments of
parabolas or higher degree curves. Straight
line constant speed flight corresponds to the
airplane controls in a neutral position. The
helical flight is a natural generalization allow-
ing arbitrary, but fixed, positions of the con-
trols. The curves which are parabolic functions
of time correspond to constant acceleration in
the three space coordinates. Thus, all these
assumptions have a reasonable physical back-
ground.
Most antiaircraft computers are constructed
on the assumption of straight line flight, al-
though some work has been done in World
War II on curved flight directors both with the
helical and the parabolic assumptions. There is
not a great deal of difference in these two
generalizations from the practical point of
view, since determination of acceleration terms
is subject to such large errors in any case.
The important part of this representation
of the target courses is that they consist of
segments of simple analytic curves joined to-
gether. The individual segments are completely
predictable if we have a part of the segment
given exactly. One need merely evaluate the
parameters of the segment from the given part
and evaluate the curve for t - tf. The unpre-
dictable part of the target courses is due to the
possibility of sudden changes from one segment
to another. With random noise functions the
unpredictableness occurs continuously.
This simplified description of the target
courses as piecewise analytic functions must
be recognized as only a first approximation. A
more complete description of the target course
would include the "fine structure," the con-
necting curves between the various analytic
segments and the deviations from the segments
due to random air disturbances and similar
causes. This latter effect, the wandering of the
target from its intended path, might be reason-
ably well represented by the addition of a
random noise function to the piecewise analytic
functions described above.
M THE POISSON DISTRIBUTION OF
SEGMENT END POINTS
The analytic segments of which the course
is supposed to consist are not all of the same
duration — we may assume some probability
distribution of the duration of these segments.
The simplest assumption here is that the
breaks occur in a Poisson distribution in time.
This assumption is not necessary for our
analysis but is a reasonable one and leads to
a simple mathematical treatment. Any other
reasonable distribution would give comparable
results.
A series of events is said to occur in a
Poisson distribution in time if the periods be-
tween successive events are independent in the
probability sense and are controlled by a distri-
bution function
p(l)dl = - e-"« dl .
a
Here p(l)dl is the probability of an interval of
length between I and I + dl. This means that
the frequency of intervals of a given length is
a decreasing exponential function of the length.
This type of distribution is familiar in physics
as describing the decay of radioactive sub-
stances. The time a in the distribution function
is the average length of the intervals, since
a>
CONFIDENTIAL
102
THE ASSUMPTION OF ANALYTIC ARCS
- e-'/a dl
'o °
= a .
It is related to the "half life" 6 of the interval
by
b = a In 2 .
The single number a completely specifies the
Poisson distribution. The events may be said
to be happening as randomly as possible apart
from the fact that they occur at an average
rate of 1/a per second.
Another way of describing a Poisson distri-
bution of events is the following. The probabil-
ity of an event in a small interval of duration
dl is (l/a)dl and is independent of whether or
not events have occurred in any other nonover-
lapping intervals.
IBUTION
S
Let us suppose that we have a record of the
course of the target up to the present time and
a complete statistical description of the set of
target courses. What can then be said about the
position of the target tt seconds from now? If
we were able to analyze the data completely
the most we could obtain would be a probability
distribution function for the future position.
This distribution function would give the prob-
ability, in the light of the course history, of
the target being at any point in space at the
future time. This function would assume large
values at likely points and low values at un-
likely points. For t, small the distribution
would be highly concentrated and for larger lt
it would tend to spread out.
In the simple case we have been discussing,
of a Poisson distribution of sudden changes in
type of course, the distribution consists of two
parts. First, there is a spike of probability at
one point, the continuation of the present pre-
dictable segment. Second, there is a continuous
distribution which corresponds to possible
changes to a new segment during the time of
flight. As t, increases the total probability in
the spike decreases exponentially toward zero,
and the total in the continuous part increases
exponentially toward unity. The behavior is
roughly as indicated in Figure 1.
i
i
i
3-2-1 (
) 1 2 3
Figure 1.
sition of
courses.
Probability distribution of future po-
target, assuming piecewise analytic
A very different type of future position dis-
tribution is exhibited with other assumptions
about the target courses. For example, suppose
the courses were random noise functions with
the power spectrum
P^ = ^Ar-, •
fl2 + 0)2
A typical noise function with this spectrum is
shown in Figure 2. In Figure 3 is shown a
typical velocity under the other assumption,
that the courses are piecewise analytic and in
fact straight lines between breaks. If the
breaks are Poisson distributed, both Figure 2
and Figure 3 have the same power spectrum,
l/(a2 + a.2). The future distribution of veloci-
ties for Figure 3 is shown in Figure 1, and for
Figure 2, it will be as shown in Figure 4. In the
random noise case the future distribution is a
CONFIDENTIAL
THE PROBABILITY DISTRIBUTION OF FUTURE POSITIONS
103
Gaussian distribution with no spike. The center
of this distribution decreases exponentially to-
ward zero with increasing time of flight ac-
cording to the formula
Xtj = A'o e "f
where X0 is the present value of the function
and X., is the mean of the future distribution.
*t t
1
— , 1
Figure 2. Typical noise function.
The standard deviation <r of the distribution in-
creases exponentially toward the rms value of
the function according to
u = A(l - e-*"/).
Supposing that this distribution function
could be determined, where should the gun be
aimed? The answer to this will depend on two
factors: the gun dispersion, and the lethal
o
o
5*
i
Figure 3. Typical velocity function.
effects of the shell. If the gun is aimed to
explode the shell at a certain point in space,
the shell will not necessarily explode at that
point, but rather there will be a distribution of
positions centered about the point aimed at,
because of gun dispersion. Also, if the shell
explodes at a certain point and the target is at
another point, there will be a certain proba-
bility of lethal effect which decreases rapidly
with increasing distance between the points.
These two functions could be combined by a
product integration to give the probability of
t if the target is at one point and
1
1
■2-1 0 I 2 3.
Figure 4. Probability distribution of future posi-
tion of target, assuming courses with random
noise properties.
the gun aimed to explode the shell at a second
point. To determine the probability of a hit
when aiming at a certain point, then, we should
multiply the probability of the target being at
each point in space by the probability of lethal
effect when it is at that point and integrate the
product over all space. The optimum point of
aim will be the one which maximizes this in-
tegrated product.
In one dimension this may be expressed
mathematically as follows. Let P(x) be the
CONFIDENTIAL
104
THE ASSUMPTION OF ANALYTIC ARCS
future position distribution of the target, so
that P(x)dx is the probability of it being in
the interval from x to x + dx at the future time.
Let Q(x,y) be the probability of hitting the
target if the gun is aimed at point y and the
target is at point x. Then the total probability
of a hit when aiming at point y is
H(y)
I
P{x) Q(x,y\ dx .
The point of aim y should be chosen to maxi-
mize R(y).
In the cases we consider, the lethal radius of
the shell and the dispersion of the gun are both
assumed to be small in comparison with the
range of future positions if there is a change
of course during the time of flight. This means
that Q(x,y) is small unless x is xery near to y.
Q(x,y) can be, in fact, considered to be a 8
function of (x-y), and the value R(y) is then
just a constant times P(y). Thus, the best
aiming point under this assumption is the most
probable future position of the target. The as-
sumption of small lethal distance is generally
valid with antiaircraft fire and ordinary chemi-
cal explosive shells.
Now the most probable future position in our
case is the spike of probability corresponding
to the analytic extrapolation of the present seg-
ment of the target course. To determine its
position one must find the parameters of this
segment and evaluate for t, seconds in the
future. For example, if the segments are as-
sumed to be straight lines (constant velocity
target) the velocity components are determined
and multiplied by t, to give the predicted
change in position. These changes are added to
the present position to give the future position.
If helical or parabolic segments are assumed,
the parameters of these curves are determined
from the past data, and the curves extrapo-
lated t, seconds into the future.
These conclusions may be contrasted with
the idea of aiming at the point which mini-
mizes the mean square error. The least squares
criterion amounts to aiming at the mean or
center of gravity of the future distribution of
position. This point will ordinarily be under
the continuous part of the distribution and not
at the spike; e.g., the point marked in Figure 1.
Its position depends to a considerable extent on
distant parts of the distribution, which would
surely bo complete misses in any case. The
chief advanta.:; . the least squares criterion
is that it fits in well with the mathematical
tools suitable to these problems, leading to
solvable equations.
The least squarns < nterion will still appear
in our analysis in rKat we attempt to smooth
our course param>:t. ra in such a way as to
minimize the mean square error in these, a
very different thinp fr m minimizing the mean
square error in th* redicted position of the
••* \ECES<] I V OK A SHARP CUTOFF
The changes in the course parameters be-
tween-adjacent segments can be very large.
Also, at the start of operations and in changing
from one target to another there will be large
and erratic variation of the input to the
smoothing and predicting circuits, unrelated to
the present target course. If any of these data
are used in prediction, the result will almost
surely be a miss because of the small lethal
radius of the shell. The only way to eliminate
these errors in a linear invariable system is to
have all weighting functions cut off sharply
after a short time. Then ail data over a certain
age are eliminated. Hits will occur only when
the target has been on a predictable segment for
this length of time or more and remains there
at least t, seconds in the future.
Suppose the weighting function for velocity
has a 1 per cent tail beyond the cutoff point
and that the trackers start following the target
from a zero position. Then after the smoothing
time there will be, because of the lack of exact
cutoff, a 1 per cent error in velocity. If the
time of flight were 15 seconds and the target
velocity 200 yards per second, this represents
an error of W yards in predicted position.
Since this is comparable to the other errors in
a typical director, we conclude that the tail of
the smoothing curve should not be much greater
than 1 per cent of its total area.
95 CALCULATION OF THE BEST
SMOOTHING TIME
Under the assumptions we have made, the
proper smoothing time to maximize the number
of hits can be determined as follows. Let P(l)
CONFIDENTIAL
CALCULATION OF THE BEST SMOOTHING TIME
.
105
be the probability that a predictable segment
of the course lasts for I seconds or more. In
the Poisson case this function is
P(l) = e-'/a
With a given smoothing time S there will be a
certain probability of hitting the target, as-
suming it has been on the present segment for
S seconds in the past and will remain there for
tf seconds in the future. We assume changes
in course to be so large that any change re-
sults in a miss. This probability of a hit Q(S),
provided it remains on the course, will be an
increasing function of S. Ordinarily the stand-
ard deviation will decrease as the square root
of the smoothing time. We have assumed the
lethal radius of the shell small compared to the
dispersion of shells about the target. The prob-
ability of a hit will then vary inversely with
the volume through which the shells are dis-
persed. If the gun itself had no dispersion but
all errors were due to tracking errors (and if
the tracking error spectrum is flat), the prob-
ability of a hit would then vary as KS*f* for
S in the region of interest. This is because
there are three dimensions and the expected
error in each of these is decreasing as S~1/2.
With gun dispersion present, Q(S) will have
the form
w>-*(.?+.ij)
-3/2
where a, is the standard deviation due to the
gun dispersion, and a2y/a/S that due to track-
ing errors. The sum of the squares is the total
variance in each dimension and the three-
halves power gives the total dispersion volume.
When these two functions P(l) and Q(S)
are known, the best smoothing time is that
which minimizes the product
P(S + tf) ■ Q(S) .
The first term is the probability of a predict-
able segment of the course lasting S -+- tf sec-
onds, and the second term is the probability of
a hit if it does last that long. Therefore, the
product is the probability of a hit with smooth-
ing time S.
In the Poisson case, with no gun dispersion,
the calculation is as follows :
P(l) = e
s + 1,
P(S + tf) = e~~ = Ae
Q(S) = .S«
f(S) = P(S + t,)Q(S) = Be~*'°
■S/a
f'(S) =b[<
-S/a 3 ^1/2 _ l^-S/o^S/!
S = la
2
The proper smoothing time is % of the aver-
age segment length, and is independent of the
time of flight and all other factors.
The presence of gun dispersion and computer
errors which are independent of smoothing
time decreases the best S from this value. In
this case the equation for optimal S is the
quadratic
, 2S 3 a
0;
hence
S
— =
a
=
-4 + a^/c\ + 6<r«
2,?
Here n, is the part of the errors which is in-
dependent of smoothing time (dispersion
errors in the computer, etc.) and at is the error
which varies inversely with the square root of
S, a, being its value at S = a. Ordinarily ^ is
several times a., in which case we have approxi-
mately
~* ~a~ o\
ffi Is
«Tl\2
There are other factors which we have neg-
lected, which decrease the best smoothing time
still further. The wandering of the target about
the predictable segments assumed in the above
simplified analysis makes old data less reliable
and therefore reduces S. Also, there is the tac-
tical consideration that when starting to track
a target it is desirable to commence firing as
soon as possible, even if reducing this time
makes individual hits somewhat less probable.
For these and other reasons the best smooth-
ing time will be just a fraction of a.
CONFIDENTIAL
106
THE ASSUMPTION OF ANALYTIC ARCS
94 NONLINEAR AND VARIABLE
SYSTEMS
The compromise required in choosing a cer-
tain definite smoothing time can be eliminated
by the use of nonlinear elements. In particular,
if a method is devised for determining when
changes of course occur, this indication can be
used to start a new linear but variable smooth-
ing operation, so that the device uses all the
data pertinent to the present segment and no
data from previous segments. There is a clear
improvement in such cases although not so
great as might be expected. There are many
practical difficulties in proper adjustment of
such a "trigger" action. If the trigger is too
sensitive it will assume new segments due
merely to tracking noise and seldom allow suffi-
cient smoothing for accurate fire. If it is too
insensitive it fails in its function of quickly
locating changes of segment. Since the noise
and target courses are subject to considerable
variation, this aujustment is not easy.
In such a system the smoothing may be
linear — the only nonlinearity is the tripping
circuit. The analysis of best weighting func-
tions, etc., given in later chapters can for the
most part be applied to such cases. There may
also be advantages to be derived from making
the smoothing operator depend on the general
position in space of the target relative to the
gun. The smoothing time may be varied, for
example, as a function of the time of flight.
This type of variation would be slow compared
to the noise frequency, and here again the
linear analysis can be used.
Whether any real advantage can be obtained
by "strongly" nonlinear smoothing in practical
cases other than these two possibilities is ques-
tionable.
CONFIDENTIAL
Chapter 10
SMOOTHING FUNCTIONS FOR CONSTANTS
The analytic arc assumption described in
the previous chapter immediately allows us
to reduce a vast proportion of data-smoothing
problems to a relatively conci'ete form. Obvi-
ously the arc will be specified by a number of
parameters and the principal object of the com-
puting and data-smoothing circuits must be to
isolate values of these parameters on the basis
of which a prediction can be made. In practi-
cal cases the instantaneous values of the
parameters are isolated by coordinate con-
verters. The function of the data-smoothing
circuit is to provide a suitable average from
these instantaneous values. This is called
"smoothing a constant'' here since the param-
eters are assumed to be constant along each
arc, although they may change radically from
one arc to another.
The data-smoothing network is most con-
veniently specified by its impulsive admittance.
(See Appendix A.) In accordance with the
assumptions made in the previous chapter, it
will be assumed that the desired impulsive ad-
mittance is identically zero after some limiting
time T. Thus, T seconds after a change from
one analytic arc to the next the new parameter
value is established. T is the so-called "settling
time" of the data-smoothing network.
With the settling time limit given, the prob-
lem of choosing a suitable data-smoothing net-
work reduces to that of finding the best shape
of the impulsive admittance characteristic for
t < T. Obviously this shape determines how
the output of the network changes in going
from the parameter value appropriate for the
first arc to that appropriate for the second. The
exact way in which the response settles from
one constant value to the next is, however,
usually of comparatively little interest. The
shape of the weighting function is of impor-
tance chiefly because of its effect on the noise.
For each noise spectrum there is, in principle,
an optimum shape for the weighting function.
The present chapter approaches the problem of
choosing a shape which will minimize the effect
of noise from several points of view.
It should be noted that the term noise as used
here does not necessarily refer to the errors
associated directly with the tracking data. The
tracking data may have been subjected to co-
ordinate conversions, differentiations, or other
processes of computation before reaching the
data-smoothing network." The noise associated
with the signal to be smoothed thus will usually
have characteristics differing from those of the
noise associated with the tracking data.
10 1 EXPONENTIAL SMOOTHING
Before attacking the problem of smoothing a
constant in a systematic way it is worth while
to consider an important special case. This is
the so-called exponential smoothing circuit. It
leads to a data-smoothing network in which
the output V is related to the input E by
V(t)
r) dr
so that the impulsive admittance W(t) is an
exponential function of time, as illustrated by
Figure 1.
-2 0 2 4 6
Figure 1. Simple exponential weighting function.
An impulsive admittance of the type shown
in Figure 1 does not show any very definite
settling time. The exponential curve ap-
proaches zero gradually, and it is a long time
after a change in course before the effects of
the data obtained on the old course are negli-
gible. This is obviously an undesirable result,
1 In exceptional circumstances the physical apparatus
in which these processes are carried out may also be
sources of additional noise.
CONFIDENTIAL
107
108
SMOOTHING FUNCTIONS FOR CONSTANTS
and the exponential weighting function is con-
sequently not a recommended one for situations
to which the analytic arc assumption applies.
The exponential solution is, however, described
here because it occurs in such a vast variety of
cases. It is found, in fact, whenever the data-
smoothing device is specified by a linear first-
order differential equation with constant coeffi-
cients. It may thus correspond to many simple
situations. For example, this is the result
which would be obtained in an electrical circuit
if we smoothed the data by placing a simple
shunt capacity across a resistance circuit. In
mechanical structures it is encountered when-
ever the damping depends either upon simple
inertia or a simple compliance.
Simple exponential smoothing also occurs in
a variety of other situations which may be
somewhat less obvious. For example, it is the
effective result in either an aided laying or a
regenerative tracking scheme whenever the
ratio between rate and displacement correc-
tions is fixed. Another somewhat similar ex-
ample is furnished by the feedback amplifier
circuit shown in Figure 2. Since rapid fluctua-
Figurx 2. Feedback amplifier circuit giving simple
exponential weighting function.
tions in the output of this amplifier are fed
back through the capacity and tend to oppose
the input voltage, the structure acts as a
smoother, and more detailed analysis would
show that it has characteristics similar to those
obtained by using a shunt capacity across a
resistance circuit. The structure is introduced
here because considerable use is made of it in
connection with the discussion of nonlinear
smoothing in a later chapter.
One simple conclusion about data-smoothing
networks can be drawn immediately from this
discussion. Since all structures simple enough
to be specified by a first-order differential equa-
tion give exponential smoothing, which has no
very well-marked settling time, it is clear that
a data-smoothing network which shows a well-
defined settling time must probably be at least
moderately complicated.
»°» CURVE-FITTING METHOD
Consider the signal E shown in Figure 3
under the assumption that the true signal is
constant and the superposed noise is random
t-T t
Figure 3. Piecewise constant signal with noise.
with a flat spectrum. The best constant A, in
the least squares sense, which can be fitted to
the signal from t - T to Ms that which mini-
mizes
Jt-i
[A - E(X)]3 d\ ,
viz.,
ff-T
E(K) .
(1)
Comparing this with equation (2), Appendix
A, it will be seen that A, which is obviously a
function of t, is the response to the assumed
signal of a network whose impulsive admit-
tance is
W(t)
1
T
0 < t < T
(2)
This is the best weighting function for smooth-
ing under the assumed circumstances. It is
illustrated in Figure 4.
A more complex situation is one in which the
true signal is a line of constant slope with
mu
T
JL
T
Figure 4. Best weighting function for smoothing
piecewise constant signal.
CONFIDENTIAL
AUTOCORRELATION METHOD
109
superposed flat random noise, as shown in Fig-
ure 5. For convenience the analysis will be
conducted in terms of the age variable r » t - \,
t-T t
Figure 5. Piec^wise linearly varying signal with
noise.
The best straight lint' A — Br which can be fit-
ted to the signal from r = 0 to t = T is that
which minimizes
£T[A-Br-E{t-r) Vdr.
Hence A and B must satisfy simultaneously
t t* i rT
Eliminating A, we get
whence by partial integration
(3)
B
t) • t(T - r) dr
Comparing this with (7), Appendix A, it will
be seen that B, which is obviously a function of
t, is the response to the derivative of the as-
sumed signal of a network whose impulsive
admittance is
W(t)
f' fV'f) 0<t<T
(4)
This is the best weighting function for smooth-
ing the derivative of the signal under the as-
sumed circumstances. It is illustrated in Fig-
ure 6 and is generally referred to as the "para-
bolic weighting function."
It should be noted also that the right-hand
member of the first of equations (3) is form-
ally the same as that of equation (1). Hence
the response of the network specified by (2)
0 T
Figure 6. Best weighting function for smoothing
piecewise linearly varying signal.
and illustrated in Figure 4, to the type of
signal shown in Figure 5, will correspond to
the value on the best straight line T/2 seconds
back from t, the present time. This network is
still the best for smoothing the signal, but it
introduces a delay of one half of the smooth-
ing time. The delay may be reduced only at
the price of a reduction in smoothing unless the
smoothing time is increased.
AUTOCORRELATION METHOD
The autocorrelation method with finite set-
tling time was first used by G. R. Stibitz in
numerical determination of the best weighting
function for smoothing the derivative of track-
ing data with typical tracking errors. This
method was also used to determine the sensitiv-
ity of smoothing to departures of the weighting
function from the best form.
The analysis is based up
V{t)
r) W(r) dr t> T
for the response to the derivative of the error
time function g(t) of a network whose impul-
sive admittance or weighting function W(t) is
identically zero for t > T as well as for t < 0.
Since measured tracking errors are generally
tabulated only at 1-second intervals, the in-
tegral may be approximated by the sum
- 1
m+Oi)
m-(H)
for integral values of t.
The instantaneous transmitted power is the
CONFIDENTIAL
110
SMOOTHING FUNCTIONS FOR CONSTANTS
square of this expression, and the average
transmitted power is
P.v, = hill J. V yttt\
* , To
This may be expressed in the form
^•.= LLWm_{t2)-Cm_n-W,_(h) (o)
where
M.a - 1
AT
m — u
is the autocorrelation of the errors. Having
computed the autocorrelation, (5) may be mini-
mized with respect to the W's by familiar
methods, under the constraint
mm 1
1
" - *
The values of W thus obtained are the speci-
fication of the best weighting function." Equa-
tion (5) may then be used to determine the
sensitivity of smoothing to departures of the
weighting function from the best form.
Proceeding along this line, Stibitz found that
the best weighting function for typical actual
tracking errors was generally intermediate to
the uniform and parabolic ones shown in Fig-
ures 4 and 6. Furthermore, Stibitz found
that the difference in smoothing obtained from
the best weighting function on the one hand
and from the uniform or the parabolic weight-
ing function on the other hand, is negligible in
practice.
The autocorrelation method was later for-
malized by R. S. Phillips and P. R. Weiss who
incorporated it into a theory of prediction.7 A
brief exposition of this formulation is given
in Appendix B.
ELEMENTARY PULSE METHOD
For the purposes of this method, an ele-
mentary noise pulse is defined by a time func-
tion F0(t) which satisfies the following require-
ments:
1. Identically zero when t < 0.
2. Contains no terms which increase expo-
nentially with time.
3. Power specLium N(„>2) is the same as that
of the noise.
The noise is then regarded as the result of
elementary noise pulses started at random.
Alternatively, it may be regarded as the result
of flat random noise passed through a network
whose transmission function is S(p) = L
[F„(t)]. As a matter of fact, only S(p) is
required in the analysis, and this is readily de-
termined from the relation
|S(uo)l2 = AF(«*) ,
together with the condition that S(u>) cor-
responds to the transmission function of a
minimum-phase physical structure (cf. Appen-
dix B).
The response F(t) to the elementary noise
pulse Fu(t) of a network whose impulsive ad-
mittance is W(t) is given by the operational
equation
F(() = S(p) ■ W(t)
in accordance with the footnote in Section A.5,
Appendix A. The best form for W(t) is there-
fore that which minimizes the integral
/.:
[F(0iJ dt
under the restriction
when t0 > T
W(t) dt
(G)
(7)
b The computations involved may be considerably re-
duced by noting the symmetry property proved in Sec-
tion B.2, Appendix B.
This is as much of the elementary pulse
method as we shall need in order to reconsider
the cases treated in Section 10.2. For the treat-
ment of more general cases the method is de-
scribed in greater detail in Appendix B.
The minimization of the integral (6) under
the restriction (7) reduces to a simple isoperi-
metric problem in the calculus of variations, in
cases in which S(p) is a polynomial in p. It is
essential first of all, however, to note that if
S(p) is of degree n, the integral (6) will con-
verge only if W(t) is differentiate at least n
times. In other words, W (t) must have con-
tinuous derivatives of all orders up to the
(n-l)th inclusive, although the nth derivative
may have finite discontinuities. In particular,
if W(t) is to be zero outside of 0 < t < T. its
CONFIDENTIAL
ELEMENTARY PULSE METHOD
111
derivatives of orders up to the (n-l)th inclu-
sive must vanish at both t = 0 and t u T. These
2n boundary conditions must be imposed on the
solution of the Euler equation which in this
case is
Wit) = A .
'(*M-i)
a is a constant parameter which is finally ad-
justed to that the restriction (7) is satisfied.
The first case treated in Section 10.2 is one
in which N(„r) = 1, whence Sip) = landF(f)
- W{t). The integral (ti) is a minimum under
the restriction (7) if Wit) is constant by
intervals. The restriction (7) then requires
W(t) to be of the form (2).
The case of first derivative smoothing treated
in 10.2 is one in which X \ *») = «,,2, whence S ip)
= p and Fit) =- Wit). If the integral (6) is to
converge at all, 11/ (t) must not have discon-
tinuities of impulsive or higher type; in other
words, Wit) must be continuous through all
values of t. The integral is a minimum under
the restriction (7) if W(t) is constant by
intervals. The restriction (7) then requires
W(t) to be of the form (4).
These results may be generalized immedi-
ately. In whatever way the signal to be
smoothed may have been derived from the
tracking data, let the power spectrum of the
noise associated with it be N(m2) = a,2". Then
Sip) =p"andF(f) = W^ (t). If the integral
(6) is to converge at all, w'n-n (t) must be con-
tinuous through all values of t. The integral is
a minimum under the restriction (7) if
WVin) it) is constant by intervals. The restric-
tion (7) then requires W(t) to be of the form
W(t)
(2n + 1) !
(
+ 1)\ ft / t \1 ■
ssr [tO-jOJ o<i<T.(8)
It may be noted that the convergence re-
quirements which arise in the foregoing dis-
cussion are directly related to the discussion
and theorem in Section A.8, Appendix A, with
respect to the relationship between discontinui-
ties in the impulsive admittance and its deriva-
tives on the one hand, and the ultimate cutoff
characteristic of the transmission function on
the other hand. The continuity of WlM) (t) is
obviously required to make the transmission
fall off ultimately at the rate of 6(n+l) db per
octave against the rise of 6n db per octave in
the noise power spectrum.
The integral (6) may also be used to evalu-
ate the relative advantage of the best weighting
function over another weighting function. As
an example, consider the case where the weight-
ing function (2) is the best. The value of the
integral (6) in this case is 1/T. If the weight-
ing function (4) is used against the same noise,
the value of the integral (6) is 6/5 T. Hence,
as far as rms error or standard deviation is
concerned, the second weighting function is
V5/6 or 0.913 as efficient as the first.
CONFIDENTIAL
Chapter 11
SMOOTHING FUNCTIONS FOR GENERAL POLYNOMIAL EXPANSIONS
THE THEORY of "smoothing a constant" de-
veloped in the preceding chapter will be
extended in this chapter to the problem of
smoothing a polynomial function of time of any
prescribed degree. The extension is, however,
restricted to the case of a flat noise spectrum.
In addition to the smoothing problem, the
analysis also provides a way of designing a
network which will extrapolate the polynomial
a given distance t, into the future. The network
is so arranged that t, is continuously variable.
In addition, the degree of the polynomial can
readily be changed to fit changes in the com-
plexity of the assumed form of the data, apart
from noise.
It is clear that these results amount, in a
certain sense, to an alternative to Wiener's
method for the design of prediction circuits for
general time series. Thus, to predict a time
series of any given complexity we would need
only to begin with a polynomial of sufficiently
high degree to fit the observed data, and extra-
polate. Aside from the restriction to a flat
noise spectrum, perhaps the most obvious dif-
ference from Wiener's method is the fact that
the settling time restriction limits the data
upon which the prediction rests to a finite in-
terval in the past. To advance such a prediction
theory seriously, however, it would be neces-
sary to go much farther into the way in which
the degree of the polynomial is established and
the justification for assuming that the extra-
polated value represents a probable future
value for the function.'
This general discussion will not be under-
taken here. Since prediction with high degree
polynomials will certainly be sensitive to minor
irregularities in the data, tracking errors
would necessarily limit the application of the
method in any case. If we confine ourselves to
reasonably low degree polynomials, however,
» As an example of possible difficulties we may notice
the fact that two polynomials of different degree which
approximate a given function as closely as possible, in
a least squares sense, in a prescribed interval fre-
quently differ radically outside that interval.
the method is useful. An example is furnished
by the prediction of airplane position, in rec-
tangular coordinates, by quadratic functions of
time. Here the square terms represent the
effects of accelerations in the various coordi-
nates. We can defend the inclusion of such
terms on the ground that it is plausible to as-
sume that an airplane may experience constant
accelerations, due to turns, the force of gravity,
etc., for considerable periods of time. The
linear term represents plane velocity and needs
no defense. The constant term, of course, gives
the plane position at some reference time. In-
cluding it in the smoothing operation is equiva-
lent to introducing "present-position" smooth-
ing of the sort suggested by the broken lines
in Figure 1 of Chapter 7.h
Aside from its direct interest as a possible
prediction method, the analysis in this chapter
is also of indirect interest for the additional
light it sheds on the effect of the noise spec-
trum on smoothing functions. It turns out that
smoothing a power of time, with a flat noise
spectrum, is equivalent to smoothing a constant
with a somewhat different noise spectrum.
Thus the smoothing functions developed for
polynomials are also useful as special cases of
smoothing functions applicable to constants.
n.i
Let A be any past value of time and let t be
the present value. If the data is fitted with a
smooth curve E (k) , the predicted value may be
taken as E(t + tf). The procedure of fitting is
the familiar one of minimizing the integral
[ E(\) - E(\) ]J W,(t,\) rfX
b In the circuit of Figure 1, Chapter 7, however, the
smoothing network would produce a lag in the present-
position data delivered to the prediction circuit, and
this lag would, of course, mean some error in follow-
ing a moving target. In the method described in this
chapter such lags are automatically compensated for
by adjustments in the coefficients of the other terms of
the polynomial.
112
CONFIDENTIAL
113
with respect to disposable parameters in E(k)
and a prescribed weighting function Wn(t,k).
The lower limit of the integral is indicated as
— oo in compliance with the physical impossi-
bility of discriminating between relevant and
irrelevant data, with fixed linear networks, ex-
cept on the basis of age. The burden of dis-
crimination must be relegated to the weighting
function which must be a function only of the
age t - A. Under the ideal restriction that
Wn(t — A) is identically zero when t - A > T or
A < t — T, the indicated lower limit of the in-
tegral is purely nominal.
As in Section 10.2, it is convenient to con-
duct the analysis in terms of the age variable
t = t — A introduced there. If
In terms of the forward time A, (2) and (3)
reduce to
F(r) = F(r) = K{\)
the integral to be mir
in the form
I may be expressed
|>» - F(t)\2 ir„(r) i/t .
tl
In accordance with the discussion of quasi-
distortionless transmission networks in Section
A. 10, Appendix A, the smooth curve K (a)
should be a polynomial in A. Hence F(t)
should be a polynomial in r. It will be more
convenient, however, to express F(t) formally
as a linear combination of polynomials in t
which may be orthogonalized. Hence, let
F{r) = \\+\'i-Gt(T)+\\-(,\(T)+ - +IV^'„<T)
(2)
where G,„(t) is an mth degree polynomial in t.
Let Wu(t) be normalized in the sense that
f W0(r) dr = 1
Jo
and the Gm(r) be orthogonalized with respect
to the weighting function W„(t) in the sense
that
/ G,(t) Gm(r) W0(t) dr = 0 if / * m
Jo » f,
= j - if / = m
(G0 = 1, Ao = 1).
The integral (1) is then a minimum with
respect to the Vm's in (2) if
Vm = km jf 00 F(T) ■ GJt) ■ H'„(t) <tr . (3)
E(\) = Yn(t) + Wit) ■ Gx(t - A) + V,(t) ■ Gt(t - A)
+ - + Vn(t) -Gn(t-\) (4)
where
!'„,(/) = km f E(\) -Gm(t-\). W0(t-\)dk.(5)
Expression (5) identifies the Vm(t) as the
responses to E(k) of fixed linear networks
whose impulsive admittances are
ir,„(r) = k„,Gm(r) : W0(r) . (6)
By (4), the predicted value may be obtained
by a linear combination of the responses of
these networks, viz.,
Mi + U) = Y»(t) + Gii-t,) ■ \\(f) + G,(-if) -Vtit)
+ ■■■ + Gn(-if) ■ Vn(t) . (7)
A schematic representation of an nth order
smoothing and prediction circuit, based on (7),
is shown in Figure 1, where the G„, ( — t,) are
represented as potentiometer factors dependent
on the time of flight.
E(nt,)
E(t>-
I 1 i— Wv-
- Y,(P) -AMAv-i
U 1 G.C-t,)
Y.(P>
AAAr-r
t>
Gn(-V 4-
Figure 1. Schematic representation of nth order
smoothing and prediction circuit.
Alternatively, (7) may be written
K(t + t/) = E(t) + - //) - G,(0)] • V,(0 + •••
+ [Gn( - tf) - G„(0)] • Vn(t) (8)
where E(t) is then replaced by Eit) when
position data smoothing is to be omitted.
It is not necessary that the G,(r) polyno-
mials be orthogonal. However, the circuit
switching required to reduce or increase the
order of the prediction is simplest when the
G„,(t) polynomials are orthogonal. Orthogonal
polynomials corresponding to any
CONFIDENTIAL
114
SMOOTHING FUNCTIONS FOR POLYNOMIAL EXPANSIONS
weighting function W0(T) are readily derived
by well-known methods,.
The weighting function W0(r) may be deter-
mined by either of the methods described in
Appendix B as the best weighting function for
smoothing position data, under prescribed
tracking error characteristics. Then the best
impulsive admittances Wm(T) for a smoothing
and prediction circuit, are prescribed by (6).
The relationship (6) shows that if the pre-
scribed weighting function W0(T) satisfies the
formal requirements for physical realizability,
so will all of the impulsive admittances Wm(r).
Of the standard sets of orthogonal polynomials
those of Laguerre appear to be the best adapted
to physical realization. The Laguerre polyno-
mials L„(a> (T) are orthogonal in 0 < t < oo
with the weighting function rae~\ However,
such a weighting function is, in general, very
unsatisfactory from the practical point of view
of settling characteristics.
It is possible of course to approximate any
prescribed weighting function W0 (t) as closely
as may be desired in a physically realizable
form, derive a set of orthogonal polynomials
based on the approximate form, and determine
the impulsive admittances Wm(T) from (6).
However, such a procedure leads to complexities
of network configuration which increase very
rapidly withrthe index to. This increasing com-
plexity is hardly justifiable in practice.
From the foregoing considerations, it ap-
pears that the most practical procedure is to
derive all of the impulsive admittances Wm(T)
without regard to physical realizability, ap-
proximate them independently in physically
realizable forms of independently prescribed
complexities, and modify or redetermine the
potentiometer factors in accordance with the
discussion in Section A.10, Appendix A.
11 a WEIGHTING FUNCTIONS FOR
DERIVATIVES
The impulsive admittances defined by (6)
for m > 0 may not be regarded as weighting
functions even though the response of the cor-
responding networks to E (a) is, by (5)
Vm (0 - f K(t -r) • Wm (t) 'fir,
Jo
because, with the exception of We(r), the
Wm(T), as will presently be seen, cannot be nor-
malized. The term weighting function is re-
served for the functions defined by (11) below.
Since rr is a linear combination of the G, (t)
where s = 0, 1, • • • , r, it is obvious from (6)
that
oo
/ ?WUl) dr = 0
when r < m .
In particular
/ WJr) dr = 0
when m > 0 .
Since the transmission function Ym(p) of a
network is the Laplace transform of its im-
pulsive admittance (see Section A.3) , we have
/CO
Wm(r) e~'* dr
y ( - p)r r
■
The first m terms in this series vanish. Hence
Ym (p) will be of the form
Tm(p) = r"y-(p) (10)
where ym (0) ^=0. This permits us to regard the
network whose impulsive admittance is Wm(T)
as an instantaneous mth order differentiator,
corresponding to the factor p* in (10), in
tandem with a purely smoothing network
whose transmission function is ym(p).
It is convenient to associate a weighting
function wm (T) with the purely smoothing net-
work whose transmission function is ym(p) .
Dividing (10) through by pm the resulting
operational equation may be interpreted (see
Section A.5) to mean that the weighting func-
tion wm(T) is the m-fold integral of the im-
pulsive admittance Wm(T) between the limits
0 and t. This is expressed by
o Jo WmiT) '{dT)m- (11>
By a relationship similar to (9) between ym(p)
and wHl (r) , it follows from ym (0) ^ 0 that
u>„(r) dr * 0 .
CONFIDENTIAL
LEGENDRE POLYNOMIALS
115
Hence the wm(T) may be normalized in the it is readily determined that
sense that
jT wm (t) dr = 1
jp- / [G«(t)]» W.(t) dr
" ^/ o
(ml)'
(2m)! (2m + 1)! '
for all values of to. However, this may he done
in general only if the G„(t) polynomials, are Then, by (6)
not normalized in the sense that km = 1 i&c any
value of to > 0. It is in fact readily shown that Wm(r) = (-)m .(2rw + U ! pm (2T - 1) 0 £ r :£ 1
the coefficient of i* in G,„(t) must be the same
as that of rm in cT.
11.3
LEGENDRE POLYNOMIALS
m!
= 0 r > 1 .
Substituting this in turn into (11) and making
use of Rodrigues' formula
The Legendre polynomials P„t (x) are orthog-
onal with respect to the range-- 1 < x < 1 and
uniform weighting. In other words, the poly- or
nomials P„(2t — 1) are orthogonal with respect
to the range 0 < t < co and the weighting func-
tion6
( — \m dm
p-<*> " SOT (1 " *>"
p-(2t - 1} - S^r £ M1 - w
W0(r) = 1 when 0 <. r <, 1
= 0 when t > 1 .
It is known from Section 10.4 that this form
for the weighting function W0(t) is best in
case the tracking errors are flat random noise.
In the integral (1) to be minimized, the Gm(r)
polynomials should then be
The first few of these are tabulated below.
it is finally found that
(2m -I- 1)!
= 0 T > 1.
[t(1 - t)]« 0 £ T £ 1
(12)
By a relationship of the form of (9) the
transmission functions ym(p) corresponding to
the weighting functions wm(T) may be deter-
mined. The first three are
1 - e-*
Vo(p)
m
0
Gm(r)
2~r
2 i_I + I1
12 2 2
3 — - + - - -
120 10^ 4 6
6
Vt(P) - Jt l(P - 2) + (p + 2)9-']
V*(P) - p 1(P» " 6p + 12) - (pi + 6p + m-'\.
These may be written in the form
Vm(p) - QmM • rM
where
(13)
With the help of the formula
j [Pm(z))*d*
2m + 1
0 The unit of time being equal to the nominal smooth-
ing time.
&(«)
QM)
0.(«)
CONFIDENTIAL
sin x / J\
-— V - V
X cos z
16 0 ~ xt) SEj * ~ 31 006 * (14)
116
SMOOTHING FUNCTIONS FOR POLYNOMIAL EXPANSIONS
or in the infinite power-series form
„r, (» + «i
Vt(p) = 60 £
■ -0
(n + l)(n + 2)
(n + 5)!
(-P)V (15)
Methods for obtaining physically realizable ap-
proximations to the weighting functions wm(r)
or impulsive admittances Wm(T), based upon
the Q functions (14) and the series expansions
(15) are described in Chapter 12.
CONFIDENTIAL
Chapter 12
PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS
This chapter will be devoted to a brief re-
view of some of the methods and techniques
which have been used in the physical realiza-
tion of data-smoothing or weighting functions.
The first two sections will be devoted to meth-
ods for determining physically realizable ap-
proximations to a desired weighting function.
The third section takes up the use of feedback
amplifiers and servomechanisms in order to
avoid the use of coils of generally fantastic
sizes. The final section takes up the design of
resistance- capacitance networks.
Methods of deriving physically realizable ap-
proximations of best weighting functions may
be divided into two classes, which may be
called, for convenience, /-methods and p-meth-
ods. The i-methods are those in which a pre-
scribed best weighting function W(t) is
approximated directly by a function W„(t) of
realizable form, viz., a sum of decaying expo-
nential terms and exponentially decaying sinu-
soidal terms. However, the <-methods are most
useful when the approximation is restricted to
a sum only of exponential terms. According to
the discussion in Section A.9, Appendix A, such
a restriction corresponds physically to passive
RC transmission networks. A <-method was
used by Phillips and Weiss in the reference
quoted in Section 10.3 to obtain an approxi-
mation with one decaying exponential term and
one exponentially decaying sinusoidal term.
However, this method rapidly becomes un-
wieldy as the number of terms is increased.
The p-methods are those in which the ap-
proximation is derived indirectly from the
transmission function Y(p) corresponding to
W(t). A rational function Ya(p) approximat-
ing Y(p) is first determined. If it is realizable,
and it usually is, then Wa(t) = L^lYaip)]. In
general, Ytt(p) will have complex poles and,
therefore, Wa(t) will have exponentially decay-
ing sinusoids as well as simple exponentials.
This gives the p-methods a considerable advan-
tage over the f-methods in more efficient use of
network elements. The fact that this generally
calls for impractical element values in passive
RLC networks is not serious. As shown in Sec-
tion 12.3, the use of coils may be avoided
entirely by the use of feedback amplifiers.
121 ^-METHODS
To describe the ^-method," let
Wa(t) = Aie-i\ + A*—* + ■ ■ ■ + Aen-.t (1)
where the a's are prescribed and the A's are to
be determined. Two considerations are involved
in the determination of the A's. The first con-
sideration is based on the relationship between
the continuity conditions at t = 0 and the ulti-
mate slope of the loss characteristic as ex-
pressed in the theorem in Section A.8. Accord-
ingly, a number of relations of the type
Ai + A-i + ■ ■ . -f- An = 0
a\ Ax + a, At + ... + a„ A„ =0 (2)
«' A , + al A2 + . . . + a„r An = 0 r < n - 1
must be satisfied. This leaves n - r - 1 of the
A's for the second consideration.
The second consideration concerns the man-
ner in which the approximation in the range
t > 0 is to be made. The approximation may,
for example, be required to pass through
n - r - 1 points on W(t) or, the first n - r - 1
moments of the approximation may be required
to be equal to the corresponding moments of
W(t). The latter is expressed by relations of
the type
Ai A2 An 1 /*c°
-+-+■■■+- = —77, / W(t) /— dt
s - 1, 2, • • • , n - r - 1 (3)
Foster's investigations were concerned only
with the parabolic weighting function (4)
Chapter 10, so that only the first of (2) was
involved. Numerical studies led to the belief
that, with a given number of a's, the best ap-
proximation was to be had from the case in
■ The i-method is principally due to R. M. Foster.
CONFIDENTIAL
117
118
PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS
which all of the a's are equal. Hence the natural
center of attention was the special form
Wa(t) = (Ait + Ad* + • ■ • + An-if -»)«-*. (4)
At large values of t this expression reduces ap-
proximately to the last term, and if it is as-
sumed that An.i = 1, the settling condition fixes
a to at least a first approximation. The rest of
the work of approximating the parabola is then
equivalent to a problem in polynomial approxi-
mation. Once the A's are determined, a better
value of a can be found from the settling con-
dition, and the process gone through again.
If the a's are only approximately equal, the
approximation will still behave approximately
like (4) with an average value used for a. The
difficulty with equal or nearly equal a's is that
it leads to networks with extreme element
values. In order to secure satisfactory element
values, it is generally necessary to depart sub-
stantially from the condition of equal a's. This
results in some, but not a large, loss of effi-
ciency in approximating the parabola. Foster
recommends that the a's be chosen as a geo-
metric series, with their geometric mean more
or less around the equivalent point for equal
a's. With four a's he suggests that the constant
ratio in the series may be 3:2, whereas with
only two a's the ratio should be raised to 2:1.
These are, however, only rough values and
obviously depend on individual opinion of what
constitutes an unreasonable element value.
As a matter of experience, it turns out that
the characteristic first obtained usually has a
rather long and slowly decaying tail, as shown
in Figure 1. This, of course, is equivalent to a
Figure 1. Approximation to parabolic weighting
function, showing poor settling characteristic.
correspondingly long "settling time," or time
before a useful prediction can be made. In
practice, therefore, after the preliminary
design has been found, adjustments are made
to bring the tail of the curve under control,
partly by modifying the values of the A's
slightly, and partly by contracting the time
scale to bring the part of the tail which remains
appreciable within the allowable settling time
limits. This leads to the somewhat lopsided
match to the parabola shown in Figure 2.
Figure 2. Approximation to parabolic weighting
function, showing better settling characteristic.
A method of bringing the tail of the curve
under control" is to minimize the expression
where
/{Wa(t)]2d! = 2£ C,„A,A,
(5)
-<.,+«m)r
ai + am
under the restrictions (2) and all but the last
of (3).
The f-methocj used by Phillips and Weiss is
based on a 3-term approximation of the form
(1) in which one a is real while the other two
may be conjugate complex. The a's are not
prescribed, so that there are six parameters to
be determined. Four restrictions are imposed,
viz., the first of (2), the first of (3), a restric-
tion on the value of the tail area, viz.,
-.r
W.(t)dt = ZAL£_L,
't '-1 at
and the cross-over condition
Wa(T) = 0.
Finally, the transmitted noise power, which,
under the assumption of flat random noise as-
sociated with the position data, takes the form
(see Section 10.4)
r
[W.(t))t di
is minimized with respect to the two remaining
parameters by numerical methods.
" Used by R. F. Wick.
CONFIDENTIAL
— —
/>• METHODS
-*-
119
12.2
p-METHODS
Three p-methods have been used. These will
be described in chronological order.
The first p-method is one which was used by
R. L. Dietzold in exploiting the use of feedback
amplifiers to secure the advantages of approxi-
mations with complex exponentials. The trans-
mission function Y(p) corresponding to the
best weighting function W(t) is first formu-
lated. The loss characteristic, -20 log,„ \ Y(im) |,
is next computed and plotted against the fre-
quency on a logarithmic scale. Then standard
equalizer design techniques are employed to ap-
proximate the loss characteristic, keeping in
mind that the transmission loss in the feedback
network of a feedback amplifier becomes a
transmission gain for the circuit as a whole
(14) of Chapter 11, we get
J/o (p) =
Vi(p) =
2 + p
12
y*(p)
12 + 6p + p»
120
(6)
The second p-method is merely a more com-
plete analytic formulation of the first, thereby
avoiding the necessity for employing equalizer
design techniques. It depends upon the possi-
bility of expressing the transmission function
corresponding to the best weighting function,
in the form of equation (13) Chapter 11, which
is associated with the symmetry of the weight-
ing function, as shown in Section A.7. The
method is based upon the determination of the
envelope of the Q-function. The Q-function is
first differentiated in order to obtain the
equation which determines the values of «
at which the maxima and minima occur. This
transcendental equation is not solved but is
used to eliminate the trigonometric functions
in the expression of the Q-function. The result-
ing expression, which is an irrational function
of «o2, is then squared in order to make it a
rational function of »>. The substitution
p* = - o.2 is made and the expression is then re-
solved into two factors of which one contains
all the poles with negative real parts while the
other contains all the poles with positive real
parts, the two factors being conjugate complex
when p = to>. The first factor is then taken as an
approximation of the desired transmission
function. Applying the method to the desired
transmission functions defined by (13) and
120 + 60p + 12p* + p» •
This last is the basis for the design of a posi-
tion and rate smoothing circuit for a proposed
computor for controlling bombers from the
ground."11 This design is described briefly
in Chapter 13.
The third p-method is based upon the ascend-
ing power-series expansion of the transmission
function corresponding to the best weighting
function. Examples of such power series are
given by (15) of Chapter 11. The method of
approximation is one which is credited to Pade
in 0. Perron's "Kettenbruchen."" If the discus-
sion in Section A.8 is referred to, it will be seen
to be also a method of moments.
The method consists in determining the co-
efficients in a rational function of the form
1 + QiP + Qip» + j- ampm
1 + blP + 6,p» + . . . + 6„p» w
so that the ascending power-series expansion
of the rational function will agree with that of
the best transmission function, term for term
up to and including pm**. If the series for the
best transmission function is
1 + cp + c,p* + . . . + c«+„p»+" + . . . (8)
the equations which determine the coefficients in
(7) are obtained by equating coefficients of
corresponding powers of p, up to and including
the (m + n)th, in
(1 + blV +
and
+ fe.p") (l + c,p + • • •
+c-+.p"+")
1 + <HP + • • • + anpm.
The last n equations will be homogeneous in
the 6's and c's.
It has been expedient in some cases to omit
the last few of the (m+n) equations in order
to have some control over the number of real
roots and poles and the number of conjugate
pairs of complex roots and poles in the result-
ing rational function.
In the assumed rational expression (7) the
CONFIDENTIAL
120
PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS
difference n — m "Should be chosen so that the
ultimate slope of the loss characteristic will be
the same as for the best transmission function.
According to the theorem in Section A.8, if
W(t) behaves like if as t->0, we should take
n — m = r + 1. As a matter of experience the
rational expression has invariably turned out
to be physically realizable whenever this "rule"
was followed. Frequently, however, the rational
expression has turned out to be physically
realizable under small departures from the
rule.
Examples of this method are given in Chap-
ter 13.
USE OF FEEDBACK AMPLIFIERS
AND SERVOMECHANISMS
In this section we shall describe the use of
feedback amplifiers and servomechanisms to
obtain desired transmission functions. For com-
plete discussions of the most recent technical
advances in the analysis and design of feedback
amplifiers and servomechanisms the reader
should consult some of the modern literature
on these subjects.2 3-51sl61T
Let us assume that we have two networks
whose transmission functions are Yt(p) and
Y2(p), respectively, as shown in Figure 3. For
Y2(P) ^>V(t)
I£(t) = Y2(p)-V(t)
itic representation of networks
ick circuit application.
a signal E(t) applied to the first network the
short-circuit output current is /,(£) = Yx(p)'
E(t). For a signal V(t) applied to the second
network the short-circuit output current is
1
Vi2
Figure 4. First step in combining networks.
hit) = 7, (p) -7(0- With the networks sharing
a common short-circuiting conductor as shown
in Figure 4, the current through the conductor
is 7, -I- I2. If the source which develops the volt-
age V(t) across the input terminals of the
second network were in fact under the control
of the current through the conductor, as shown
schematically in Figure 5, in such a manner
Figure 5. Output voitage controlled by short-
circuit current across intermediate terminals.
that it had to develop that voltage V(t) which
reduces the current in the conductor to zero,
then
Yxip) E(t) + Yt(p) ■ V(t) = 0 .
Hence, the transmission function (now a volt-
age-voltage ratio) of the arrangement shown
in Figure 5 must be
Yi(p)
Y(p) = -
(9)
Y,(p) '
This relationship provides a method of ob-
taining transmission functions with complex
poles without the requirement of coils.0 The
complex roots of Y(p), must be assigned to the
numerator of Y1 (p) , and the complex poles of
Y(p) to the numerator of Yt(p). Aside from
this, the other roots and poles of Y(p) may be
assigned in any way which is favorable to good
design practice. Redundant factors may be in-
troduced if they are desirable, as is done in the
examples described in Sections 13.1.5 and 13.3.
The source of the voltage V(t) in Figure 5
does not' have to be controlled by the current
through the short-circuiting conductor. Since
the current through any short circuit must be
zero if the voltage across the short-circuited
terminals is zero before the short circuit is con-
nected across them, the source of the voltage
V(t) may just as well be controlled by the
open-circuit voltage, as shown in Figure 6. It
is clear that the source of the voltage V(t) is
ideally an infinite gain amplifier. It is not nec-
essary, however, that the amplifier have ideally
unilateral transmission and infinite input and
output impedances, since departures from these
ideal characteristics may be compensated for in
the design of the feedback network.
The simple result expressed by (9) may be
readily modified to take account of the finite
0 This observation was first made by R. L. Dietzold.
CONFIDENTIAL
DESICN OF RC NETWORKS
121
gain of a physical amplifier. The modification
will be expressed as an extra factor which
corresponds to the "rf effect" or "nfi error"lie
commonly encountered in the theory and design
of feedback amplifiers.
■C
7T
Figure 6. Output voltage controlled by open-
circuit voltage across intermediate terminals.
The exact transmission function of the cir-
cuit shown in Figure 6 is most simply ex-
pressed in terms of the following quantities:
= current through a short across ter-
minal-pair No. 3, per unit emf applied
across terminal-pair No. t.
Y2 (p) = current through a short across ter-
minal-pair No. 3, per unit emf applied
across terminal-pair No. 2.
Z2 (p) = impedance between terminal-pair No.
2, with terminal-pair No. 3 shorted.
Z3(p) = impedance between terminal-pair No.
3, with amplifier dead, terminal-pair
No. 1 shorted, and terminal-pair No. 2
open.
G(p) =transadmittance of amplifier.
Then
i -
i
(10)
The quantity GYJZ„Z3 is the of the circuit.
The quantity Y,Y,Z„Z3 to which Y reduces
when G = 0 represents the direct transmission
of the circuit.
The active impedance across terminal-pair
No. 2 is
Zip
(ID
ZtA
1 — Gi 2Z2Z3
where
ziP = zt{\ + r|?,z,) . (12)
ZtP is the passive impedance across terminal-
pair No. 2. It differs from Z„ in that terminal-
pair No. 3 is open.
The exact expression (10) of the transmis-
sion function is useful chiefly as a check on the
simpler but approximate expression (9). It is
in general quite practicable to make the trans-
admittance or transconductance G of the am-
plifier large enough so that the n0 effect may be
neglected.
In accordance with the sense in which the
term "servomechanism" is used by MacColl,4
a feedback circuit, such as that shown in Fig-
ure 6, is a servomechanism — more specifically,
an electronic servomechanism — since it oper-
ates on the ideal principle of maintaining zero
voltage across the terminal-pair No. 3. An
electromechanical counterpart of the circuit
shown in Figure 6 is shown in Figure 7. These
2- PHASE INDUCTION
MODULATOR MOTOR
: 7. Electromechanical counterpart of feed-'
back amplifier circuit resulting in servomechaniMti.
circuits assume that the signal E(t) is a modu-
lated d-c carrier.
If the signal is a modulated a-c carrier,
"shaping" cannot be done conveniently by elec-
trical networks. The difficulty may be avoided
by various special devices. An example is de-
scribed and illustrated in Section 13.4.
12.4
DESIGN OF RC NETWORKS
In this section we will describe and illustrate
two general methods of designing RC networks.
The first is most useful when the transmission
function is finite and not zero at zero fre-
quency; the second, when the transmission
CONFIDENTIAL
122
PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS
function is zero at zero frequency. The case of a
transmission function with a pole at zero fre-
quency will not be considered, since it is cov-
ered by the methods , described in the preceding
section, in conjunction with the methods de-
scribed below.
Let
Y(p)
Op + QiP + ••• + Q.+iP"*1
(flo>0) (13)
1 + 6iP + • ■ • + 6»p"
with simple, real, negative poles. Dividing by
p, expanding into partial fractions and multi-
plying through by p, we get
On V + «1 P + «»
\p + Mi P + fit
•)
+
)
where the A's, B's, ats and 0"s are positive real
quantities. The first term must be associated
with those in the first parentheses if an+l > 0,
with those in the second parentheses if an+, < 0.
The transmission function is now in the form
Y(P)=YAP)-YB(P) (14)
where YA(p) and YB(p) are physically real-
izable driving-point admittances of RC type.
Each term of the form pA/ (p + a) is the admit-
tance of the two-terminal, two-element network
a ..a
s — wwv — 1| — 0
Figure 8. Simple RC network.
shown in Figure 8. Each term in (14) there-
fore represents a parallel combination of two-
element networks of the type shown in Figure
8 and a conductance a0 in the case of YA(p),
PHASE
INVERTER
SUMMING
AMPLIFIER
Figure 9. Method of realizing RC transmission
functions, requiring phase inverter.
and a capacitance |Onn|/b„ in the case of either
YAP) or YB(p). By well-known methods these
two-terminal networks may be transformed
into a variety of other configurations.
The transmission function (14) may be real-
ized in the arrangement shown in Figure 9
or in that shown in Figure 10. The latter is
a lattice network which is suitable only in a
LINE BRANCH
I = (YA-YB).E
Figure 10. Lattice prototype for passive net-
works with RC transmission characteristics.
balanced-to-ground circuit. To obtain an un-
balanced passive equivalent of this network we
may resort to steps which will be described
later in this section.
The second general method of designing RC
networks is most useful when
Y(r>) = r> a° + a'P + • ■ + q"P"
KV) P 1 + blV + ••• + 6.p-
(«o > 0)
(15)
with simple, real, negative poles. Now, if the
lattice in Figure 10 were driven from an in-
finite-impedance source of current /„, the out-
put current would be
1 -
/ =
I*
Ya
Yh'
1 t7~
If, furthermore,
Is
Ya
then
P
»+!
p
(16)
Taking it for granted for the moment that the
lattice can be transformed as shown schemat-
ically in Figure 11, we may then discard the
condenser across the output terminals and, by
Thevenin's theorem,1" we may replace the
condenser across the input terminals and the
infinite-impedance current source by a series
condenser and a zero-impedance voltage source.
The result is shown in Figure 12. Since
CONFIDENTIAL
desk;* of rc networks
123
V F.
I, - pC E we now have
7 = ( "
k
which ia the desired result, to a constant factor.
The factor k should in general be taken as
small as possible subject to the requirement
that all the roots and poles of (16) be simple,
Figure 11. Step in transformation of networks
with zero transmission at zero frequency.
real, and negative. It can always be taken large
enough to fulfill this requirement. A suitable
value may be easily chosen by inspection of a
plot of Y (p) fp for negative real values of p.
Figure 12. Final step in transformation of net-
works with zero transmission at zero frequency.
The numerator and denominator of (16) are
of equal degree and therefore contain the same
number of linear factors. These factors may be
assigned to YA or to YB arbitrarily except that
YA and YF must be physically realizable driv-
ing-point admittance functions which behave
ultimately like condensers as the frequency in-
creases indefinitely; that is, roots and poles
must alternate and there must be a simple pole
at infinity.
There are five kinds of steps which may be
taken to transform a lattice into an unbalanced
form. These steps are based upon Bartlett's
bisection theorem,14 and may be taken in any
order and as often as necessary. Each of them
will now be described as it would be applied
directly to Figure 10. In the following diagrams
a lattice enclosed in a rectangle means an un-
balanced network whose configuration may not
be known yet, but whose lattice prototype is as
indicated.
1. Shunt network pulled out of both branches :
shown in Figure 13.
2. Shunt network pulled out of the line branch
only: shown in Figure 14.
3. Series network pulled out of both branches :
shown in Figure 15.°
4. Series network pulled out of the lattice
branch only : shown in Figure 16.c
Figure lii. Step in transiormauon oi lattice;
shunt networks pulled out of both branches.
Figure 14. Step in transformation of lattice;
shunt network pulled out of line branch only.
Figure 15. Step in transformation of lattice;
series networks pulled out of both branches.
i
■
i
ft
Figure 16. Step in transformation of lattice;
series network pulled out of lattice branch only.
* Given in impedance form.
CONFIDENTIAL
124
PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS
5. Breakdown into parallel lattices: a fairly
obvious step which need not be illustrated.
As an example of (13) consider
I(P) l+blP
where all the coefficients are positive. Since
y(p) = P£} -f- a0 - Oil. ~ °lbl + ff»)p
there is no problem if a, > (a,/^) + a^^ But if
Ox < (aj/6,) + a06x we have the problem of trans-
v — 5 —
Figure 17. Illustrative lattice prototype.
forming the lattice in Figure 17. We can apply
steps 2 and 4 immediately, but find that the
residual lattice cannot be transformed unless
a, > {ajb,). Under this additional restriction
we can apply step 8 obtaining finally the net-
work shown in Figure 18.
As an example of (15) consider
Taking k = 1 (the smallest value which may be
assigned) , we get
Yb m 2p(3 + 16p)
(1 + 2p) (1 +
One way of choosing YA and YB is
Y (1 + 2p) (1 + 16p)
A 2(3 + 16p)
This leads finally to the network shown in Fig-
ure 19. Such a simple network is possible of
YB = p.
course because F(p) happens to satisfy the re-
quirements of a physically realizable driving-
point admittance function. However, another
way of choosing YA and YB is
YA
l_±_2p Y p(3 -I- 16p)
2 * " 1 + 16p
This leads to the network shown in Figure 20.
II
Figure 18. Unbalanced equivalent of illustrative
lattice prototype when 02/61 <oi< (a2/6i) + 006!.
Ro=l2
)
— wv\a — 1| —
0 =44 r = —
1 5 c« 9
Figure ltf. KC' network with zero transmission at
aero frequency.
C0=l Ro=2
-AAAAAr
R0=2
■AAAAAr 1
R,= 3
:C,=4
Figure 20. Another /2C network with zero trans-
mission at zero frequency.
CONFIDENTIAL
Chapter 13
ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS
rpHE ILLUSTRATIVE material described in this
J- chapter is taken from four practical appli-
cations.
1. Second-derivative circuit for the M9 anti-
aircraft director.
2. Position data smoother for the "close sup-
port plotting board," with delay correction for
constant velocity aircraft.
3. Position and rate circuit for the "com-
puter for controlling bombers from the
ground," with optional delay correction of posi-
tion data for constant-velocity aircraft.
4. Position and rate circuit using electro-
mechanical servomeeha.'Msms.
The design and analytical procedure used in
the first application has not heretofore been
described in writing. Hence, considerably more
space will be devoted to it than to the other
three applications. The latter have been de-
scribed in detail in reports.1" 1; 13
ls 1 SECOND-DERIVATIVE CIRCUIT
DESIGN
,, M Realizable Approximation of Best
Transmission Function
The best transmission function for the sec-
ond-derivative circuit was taken to be
JVp) = p%(p) ,
in the notation of Chapter 11. This assumes fiat
random noise in position data and, arbitrarily,
1-second smoothing and settling time. The
series expansion of y.,(p) is, according to ex-
pressions (15) of Chapter 11,
yf(p,-i -Ip + ip.. JLp. + jl-p*...,.
The form of the rational approximation,
yip) =
1 + 6,p + b2p* + b3p3 + b<p4'
was chosen for simplicity under the require-
ment that the transmission function p*y(p)
should cut off at the rate of 12 db per octave."
This requirement was set as a precaution
against noise due to granularity of the coordi-
nate-conversion potentiometers in the director.
Following the procedure outlined in Section
12.2 the following equations were obtained :
!>i — 2 = 0
0
b< -\bi + lbt -± b1 + 1^
1 h - 3 h 1
2' J 28' 1 ~ 53
84'
whence
Since
p* + 21pJ + 189p* -(- 882p + 1764
21 + V21
1
1764
- ip» +
P + 42)
x rp« + 21 -y^p + 42) ,
2
yAv) would have two conjugate pairs of com-
plex poles, viz.,
p = - 6.40 ± il.047, - 4.10 ± t6.02,
of which one pair is very nearly real.
In order to simplify the circuit design, how-
ever, it was desirable to limit the number of
complex poles to a single conjugate pair. This
was accomplished by leaving b4 arbitrary so
that the denominator of y2(p) was
1 + 5p + kp,+ 8lp, + bipt •
A value for bt which would make this expres-
sion vanish at two negative real values of p
was found by plotting
176464 - 5 (*» - Ox* + 42x - 84)
' The design antedated the formulation of the n — m
= r + 1 rule given in Section 12.2, according to which
the best transmission function should have been taken
as p'y,(p) in the notation of Chapter 11. However, no
trouble waa experienced in obtaining a physically real-
izable approximation, of the complexity assumed.
CONFIDENTIAL
125
126
ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS
against x, as shown in Figure 1. The right-
hand member is positive only in the range
x > 3.77 and has a maximum of 0.982 at about
z = 6.63.
1.0
08
06
04
02
1764 b4
i
XJl
1.0 2.0 4.0 6.0 6.0 IO0
Figure 1. Graphical determination of 6«.
In order to obtain a substantial separation
between the two real poles of y2(p), the value
17646, = 0.5 was chosen. The approximation
V(P)
1
3528
has poles at
p - - 4.17391 , - 31.72813 , - 3.04898
* t 4.16463 .
The series expansion of y., (p) agrees with that
of Vt(p) to four terms, the fifth term being
37/7056 p* instead of 5/1008 p\ The difference
in the fifth term is less than 6 per cent.
The realized approximation and the best
weighting function are shown in Figure 3.
is.u Transient Responses
The responses of the physical network whose
transmission function is p2y2(p) are compared
to those of the best network whose transmis-
sion function is p2y2(p), in Figures 2, 3, and 4.
The signals for which (and the formulas by
which) these responses were computed are
tabulated below.
Response formulas
Realized Best
L~Hm(p)\ 00/(1 -20(1 -/)
L~l\Vdv)\ mu\-t)\*
Figure
Signal
/ <0 I £0
2
0 1
3
0 t
4
o >f
V
/'(10- 15/ + 6/1)
It has been noted that Figure 3 also repre-
sents the best and the realized weighting func-
tions.
mauko
u
u
it
_II»T
\
<
h
»
•
1 »
\ t
\ «
0
u
V
to
\ 1
\ \
V*
*
t
V 1M M V HB IM Mm 1
Figure 2. Responses to step function, viz., E (t) =
1 when t > 0.
u
u
u
<u
\
A,
!
. ICST
w
i
KALIItO
0
M
%
—
t
Figure 3. Responses to linear ramp function, vfz.,
E(t) - t when t > 0; second derivative smoothing
functions.
~0~
Figure 4. Responses to parabolic ramp function,
viz., E(t) = (%)£ when t > 0; second derivative
settling characteristics.
CONFIDENTIAL
SECOND-DERIVATIVE CIRCUIT DESIGN
127
If a signal of the form
Eif) = at + a J + -., (hfi
were to be applied suddenly to the second -de-
rivative circuit at t = 0 the response would be
r'-; ! (;)-•;•< (?)+*.•<■(?)
where A,„ A,, A . stand for the responses shown
in Figures 2, 3, and 4, respectively, and where t
is the time in seconds and T is the nominal
smoothing time. The response V(t) is the indi-
cated acceleration of the target.
The sudden application of the instantaneous
position and velocity components of the signal
to the second-derivative circuit will give rise to
some very serious consequences unless special
measures are taken to mitigate them. To see
this let it be assumed that T = 20 seconds and
that the target is at such a range that a„ =
20,000 yards when the signal E (t) is applied
to the second-derivative circuit. Each unit of
A0 in the ordinate scale of Figure 2 then repre-
sents an indicated acceleration of 50 yd per
sec-. Referring to Figure 2 it is clear not only
that the effective settling time will be several
times the smoothing time but also that the indi-
cated acceleration will go through exceedingly
large maxima.
Exceedingly large transient responses are
not peculiar to second-derivative circuits. They
occur also in first-derivative circuits in linear
prediction, where they are due entirely to the
initial position term in the signal. In all cases
they are reduced to harmless proportions by
special arrangements of the circuits during the
operation of slewing.
tion Ys of the experimental second-derivative
circuit design, also referred to a nominal
smoothing time of 1 second. The transmission
function of the linear prediction circuit with
10-second smoothing of first derivative is then
:— JTTT
Table 1*
»/
. -
Yi
Y,
1
0.174
i
0.666
—0.454
i
0.165
2
0.651
1.166
—1.442
1.212
3
1.312
1.358
— 2.014
3 527
4
1.943
1.203
—1.069
6.688
5
2.382
0.821
2.000
9.409
6
2.599
0.364
6.575
10.115
7
2.637
-0.067
10.893
8.220
8
2.558
—0.429
13.468
4.695
0
2.416
—0.711
14.096
0.953
10
2.242
—0.920
13.401
— 2.092
11
2.062
—1.070
12.064
— 4.320
12
1.885
—1.172
10.530
— 5.777
13
1.720
-1.238
9.027
—6.704
14
1.566
-1.279
7.652
-7.169
15
1.429
-1.299
6.438
-7.398
lb
5.382
-7.446
17
4.471
-7.374
18
1.096
-1.286
3.683
-7.221
19
1.004
-1.268
3.015
-7.025
20
0.926
-1.247
2.436
-6.795
22
0.790
-1.198
1.509
-6.292
24
0.683
-1.145
0.818
-5.780
26
0.593
-1.091
0.301
-5.287
28
0.518
-1.040
0.088
-4.828
30
0.457
-0.380
-4.402
32
0.407
-0.945
-0.599
-4.016
34
0.364
-0.902
-0.762
-3.666
36
0.326
-0.862
-0.881
-3.348
38
0.296
-0.825
-0.967
-3.062
40
0.266
-0.790
-1.026
-2.800
• f is in
c when smoothing time T = 1
sec. For
T-second net-
works. values of 9/ are multiples of 1/9T e, values of Yt should
bo divided by T, and values of Yt should be divided by T». The
lwo networks may have different values of 7*.
13.1.3
Effect of Tracking Errors on while that of the quadratic prediction circuit
Accuracy of Prediction with 20-second smoothing of second derivative
The statistical effect of tracking errors on 1S
the accuracy of prediction is most readily de-
termined from the power spectrum of the
tracking errors and the transmission function
of the prediction circuit.
Table 1 gives the values of the transmission
function F, of the first-derivative circuit in the
M9 director, referred to a nominal smoothing
time of 1 second,'1 and the transmission func-
>V/0
(0.9-
9494_ K.077 31 74
1.6 V + 2.4 /. -r :Ui
27 01 \
v + ah)
Y,(P) - JVp) +
r»(20p)
i G2 are determined in accordance
with the discussion in Section A.10. Since
we get
)',(p) = p(l - 0.3724p +
)-,<p) = p2(l -•••)
,
0', = //
ft - I </ + 3.7241, .
)
CONFIDENTIAL
128
ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS
Table 2 gives the values of \Yi(p) |J and of
\Yq(p) \* for tt = 5, 10, 15, 20 seconds. These are
plotted in Figures 5, 6, 7, and 8.
of the total power, or an rms error of 15.8
yards out of 17.9 yards.
The rms error of prediction is the square
root of the power transmitted by the prediction
circuit. This is tabulated on the last line of
Table 2 and in the smaller table following.
Figure 5. Power transmission ratio of linear
and quadratic prediction circuits with 5-second
prediction time.
The last column of Table 2 and Figure 9
give the power spectrum of a composite of the
range and transverse errors in a typical run
The power contained in the frequency range
covered by the table accounts for 78 per cent
40
rawt* THANsyiuiON «atio
V-IOMC
JO
a
-quad nta
20
IS
10
3
0
»0f
4
1
i u
» II 20
Figure 6. Power transmission ratio of linear and
quadratic prediction circuits with 10-second pre-
diction time.
Table 2
10
90/
IFil*
|Tff*
! Y,\* I
0
1.00
1.00
1.00
1.00
1
1.29
1.13
1.82
1.60
2
2.10
2.76
4.08
8.90
3
3.20
6.85
7.19
26.73
4
4.2
10.0
10.1
39.5
5
5.0
10.5
12.1
39.9
6
5.3
9.8
13.1
35.6
7
5.4
8.8
13.2
30.8
8
5.2
7.9
12.8
26.6
9
5.0
7.1
12.2
23.0
10
4.7
6.3
11.4
20.0
11
4.4
5.7
10.5
17.5
12
4.1
5.1
9.7
15.3
13
3.8
4.6
8.9
13.5
14
3.6
4.2
8.2
12.1
16
3.4
3.8
7.6
10.6
16
3.2
3.5
7.0
9.5
17
3.0
3.2
6.5
8.5
18
2.8
3.0
0.0
7.7
19
2.7
2.8
5.6
7.0
20
2.5
2.6
5.3
6.3
rms
error of
prediction
23.9
29.5
33 9
53.4
15
20
IK.!*
\Y,l*
P* Mk-vn
1.00
1.00
1.00
1.00
31.4
2.59
2.71
3.59
4.81
33.5
6.97
23.16
10.74
50.35
35.7
12.96
72.51
20.51
159.43
19.7
18.6
106.1
29.76
231.3
3.6
22.4
104.4
35.9
223.9
2.5
24.3
90.6
38.9
190.6
1.2
24.6
76.6
39.4
158.4
1.6
23.8
64.7
38.2
131.8
2.1
22.5
55.0
36.0
110.6
1.4
21.0
47.0
33.5
93.5
0.7
19.3
40.4
30.8
79.6
0.8
17.7
35.0
28.3
68.2
0.8
16.3
30.4
25.8
58.9
0.5
14.9
27.1
23.6
52.0
0.3
13.7
23.4
21.6
44.5
0.8
12.6
20.6
19.8
39.0
1.1
11.6
18.3
18.2
34.4
0.8
10.7
16.3
16.8
30.4
0.4
0.7
9.9
14.6
15.5
27.0
9.2
13.1
14.4
24.1
1.0
44.5
85.4
55.4 125.0
• P U in uniu of 180 yd" per c
CONFIDENTIAL
SECOND-DERIVATIVE CIRCUIT DESIGN
129
Time of flight
in seconds
5
10
15
20
Rms error of prediction due
to tracking errors in yards
Linear Quadratic
23.9
33.9
44.5
55.4
29.5
53.4
85.4
125.0
It is obviously relatively disadvantageous to
use quadratic prediction when the target is in
fact flying a rectilinear unaccelerated course.
Figure 7. Power transmission ratio of linear
and quadratic prediction circuits with 15-second
prediction time.
1
1
POWER TRANSMISSION RATIO
X,'10XC
2M
MO
QUAD MED
IM
00
41
UN
preo
*
1 1 i
J
1
J — I
i r
•
1 2o
Figure 8. Power transmission ratio of linear and
quadratic prediction circuits with 20-second pre-
diction time.
The relative advantage of linear prediction
should persist for target paths with only a
slight amount of curvature, but this relative
advantage should decrease as the curvature is
increased. When the curvature exceeds a cer-
tain amount, the relative advantage should
shift to quadratic prediction.
The determination of the minimum value of
target path curvature at which quadratic pre-
diction becomes relatively advantageous de-
pends not only upon:
1. dispersion of the predicted point of im-
pact due to tracking errors,
but also upon a number of i
which are :
2. actual future position of target with
respect to the predicted point of impact, assum-
ing an accurate computer and the absence of all
sources of dispersion enumerated here ;e
3. dispersion due to inaccuracies in the com-
puter and data-transmission systems ;
4. dispersion due to noise in the computer
and data-transmission systems ;
5. dispersion due to variations in actual dead
time;
6. dispersion due to gun wear and to varia-
tions in powder charge, shell weight, shell
shape, etc.;
■J*
0
s
POWER SPECTRUM
or
TRACRM8 ERRORS
MARK VII ROMS AS A 14
s i
it.
e
m
' i
i
1 1 r
-
" 1
1 it 1
* " — fi — =ft — it
Figure 9. Composite power spectrum of tracking*
errors of experimental radar.
7. dispersion due to variations in meteoro-
logical conditions along the path of the shell ;
8. dispersion due to variability of time-fuze
calibration ; and
9. lethal pattern of shell burst.
In a special illustrative case, a numerical
analysis, including most of these factors (esti-
mated), showed that quadratic prediction be-
comes relatively advantageous when the target
acceleration exceeds about O.lg. However, this
should not be taken as a general result.
o This is considered in detail in the next section.
CONFIDENTIAL
130
ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS
1,1 * Linear and Quadratic Prediction
Errors on Constant-Velocity
Circular Courses
The use of a finite number of derivatives of
the tracking data for purposes of prediction is
itself a source of prediction errors even if there
were no tracking errors. Definite evaluation of
these prediction errors can be made only if the
path of the target is prescribed. The simplest
path which can be prescribed for this purpose
is a circular one at constant velocity. Such a
path is fairly realistic when considered in rela-
tion to the difficulty of maneuvering a bomber
and to actual records of the paths of hostile
bombers over London during World War II.
The position of a target flying in a circle at
constant velocity, referred to the center of the
circle, is expressed by the complex quantity
Re** where R is the radius of the circle and «
is the angular rate. In terms of the velocity V
and the transverse acceleration A, we have
R = V*/A w = A/V. The predicted position is
then at JtT(i»)e'-' where Y(u.) is the trans-
mission function of the prediction circuit. The
true future position of the target, however, is
at R exp [i«>(t + t,) ]. Hence, the prediction
error, referred to axes fixed on the target and
oriented respectively transverse to and in the
direction of the present velocity, is
« ~ RlY(iu) - e"r] .
As an illustration let us consider a case in
which V = 150 yd per sec, A = 5 yd per sec1 and
tf = 10. For the linear prediction circuit
Yrffo) - 1.0409 + /0.3296
and for the quadratic prediction circuit
r,(»«) - 0.9501 + t0.3610
while
- 0.9450 + t0.3272 .
Hence, when the present position of the target
is at 4500 + t'O with respect to the center of the
circle, the linear predicted point is at 4684 +
tl483, the quadratic predicted point is at
4276 -I- t'1624 while the true future position is
at 4252 + t'1472. These are shown in Figure 10.
The prediction error vectors are
«, = 432 + /ll jt|; = 432
«t = 24 + f 152 |«v = 154
Referring to Figure 10 it may be observed
that if the first-derivative component of the
prediction were to be reduced by approximately
10 per cent a nearly perfect hit would be ob-
tained. This suggests the possibility of deter-
2000 -
<
QUA0RAT IC PREDICTED
, POSITION
SECOND DERIVATIVE
TRUE FUTURE
POSITION — ^
(10 SEC) ^
LC Ml TUTOR
— tv LINEAR
^PREDICTED
I
!
i
1
jf
POSIT BN
1
-»
•
NATIVE
TOR
woo -
•
1 FIRST Kl
LlEAO VE<
1 —
•
•
i
I
4M0 m TO
9 CCMTC* Or TURK
1 PRESENT POSrTMM
Figure 10. Vector diagram of linear and quadratic
prediction for constant-velocity circular courses.
mining empirical functions of the time of flight
for the potentiometer factors G, and G, in
order to improve the probability of kill. This
would involve consideration of all of the
sources of dispersion enumerated in the preced-
ing section as well as a statistical study of tar-
get paths. Such a determination has not been
attempted.
it. i s Physical Configuration of the
Second-Derivative Circuit
In this section we shall derive a physical con-
figuration for the second-derivative circuit. In
particular it illustrates the application of feed-
back to the realization of weighting functions
or impulsive admittances involving complex
exponentials in general." It should be pointed
out, however, that the application of feedback
to the end in view is not restricted to purely
0 Originally proposed by R. L. Dietzold.
CONFIDENTIAL
CIRCUIT FOR CLOSE SUPPORT PLOTTING BOARD
electronic circuits. An application involving
the use of servomechanisms will be described
in Section 13.4.
The transmission function which concerns us
here may be expressed in the partially factored
form
Y(P) =
((> + 0.2087) i/> + l..)S04)(/;- + 0.3U4<»p + O.OttOli)
where the |>oles have been adjusted to cor-
respond to T = 20 seconds and where a constant
factor has been left out.
The circuit is to be designed to work out of
the amplifier in the first-derivative circuit of
the M9 director. Since this much of the first-
derivative circuit has a transmission function
of the form p (p-t-0.24), the transmission
function which we have to realize is Y ,(p) /
Y,(l>) where
and
P f 0.20S7' ip + i..W»4i
Y,ip)
U.MWp + IMKttWi
p + 0.24
The inversion of the factor corresponding to
Y,(p) is in accordance with the fact that the
transmission gain through a feedback amplifier
is equal to the loss in the feedback network,
provided the feedback is very large. To realize
the transmission function Y,(p) /Y,(p) it is
therefore necessary only to realize the trans-
it
SMOOTHING
NET WORK
1 — 1| — WVW^WV-
»,C,= J.IM
Ci = o.ai?oc,
R,C, = J. 604
R,= 0.07UI R,
= iz.n
T-O-T
R,/2
Figure 11. Physical configuration of quadratic
prediction circuit for modified M9 AA director.
mission functions Y{(p) and Y,(p) individu-
ally. The corresponding networks are shown in
Figure 11, with typical element values.
The input network has four elements,
whereas Y, (p) has only two parameters. Hence
there are two degrees of freedom in the element
values of this network. One degree of freedom
must be reserved for the impedance level; the
other permits some latitude in the relative
values of the resistances and stiffnesses.
The feedback network has four independent
elements, whereas Y,(p) has three parameters.
Hence there is only one degree of freedom in
the element values of this network. This degree
of freedom must be reserved for the impedance
level.
There is, however, one degree of freedom be-
tween the impedance levels of the two net-
works. This follows from the fact that the
transmission function of the circuit is the ratio
of the transmission functions of the individual
networks. The scale factor for the transmission
function of the circuit is readily determined
from the fact that the transmission function
must be approximately pRt,C„ at small values
of p.
13.2
CIRCUIT FOR CLOSE SUPPORT
PLOTTING BOARD
In this application, position data smoothing
with delay correction for constant rates of
change in position was required. Assuming flat
random noise in position data, and, arbitrarily,
1-second smoothing time, the best transmission
function for position data smoothing without
delay correction is yu(v) in the notation of
Section 11.3. The best transmission function
for the first-derivative circuit, if it were re-
quired, is pyx (p) . Hence, the best transmission
function for position data smoothing with full
delay correction is
= »o(p) + g P*l(p) •
This corresponds to the weighting function
Wi(t) = 14,(0
= 2(2-3/) 0 < / < 1 .
The series expansion for Y,(p) is, by (15)
of Chapter 11,
P4
Yi(p)
PJ + £ _ JL- +
12 T 30 120 T
CONFIDENTIAL
132
ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS
The form of the rational approximation was
chosen as
' W 1 .+ blP + 62pl + b,p*
in order to obtain a loss characteristic which
has an ultimate slope of 12 db per octave.* This
requirement was also set as a precaution
against noise due to granularity of the coordi-
nate-conversion potentiometers. The coefficients
are determined by
13.3
i
fci = ai
-n> = °
+ ™
30
6,
-V2b> + 3ofel - lib = °
whence
Y(p) =
1 + Hf + If' +
1440
This may be expressed in the form Y(p)
YAp)/Y,(p) where
1
7<(p) = 1 -(- 0.1053p
„ , , 1 + 0.3530p + 0.0461 5p'
w) - —
The circuit
Figure 12.
1 + 0.4583p
ion is shown below in
R./2 "•/*
-VWWAVW
=!=C,
R,CV0.4?I3
R,C, =0.1007
R, = 0J06IR,
8,^=0.8241
Figure 12. Physical configuration of data-smooth-
ing circuit for close support plotting board.
• This design also antedated the formulation of the
n — m = r + 1 rule given in Section 12.2 according to
which we should have taken Yi(p) « y,(p) + % pyAp) ■
CIRCUIT FOR GROUND-CONTROL
BOMBING COMPUTER
In this application, rate smoothing as well as
position smoothing was required. In addition,
delay correction in position, for constant rate
of change, was to be available but optional, and
the loss characteristic was to have an ultimate
slope of 12 db per octave, or more.
In accordance with the n — m = r + 1 rule,
the best transmission function for position data
is y1 (p) , whereas that for rate is pi/: (p) . A num-
ber of designs were made on this basis. How-
ever, from the point of view of network econ-
omy they were inferior to a design based on
j/2(p) for position data. The use of 2/2(p) for
position data is not consistent, theoretically,
with the use of pi/2(p) for rate, but the practi-
cal advantage outweighs the theoretical disad-
vantage.
The rational approximation used for i/,(p)
4r
MR, 0JR,
l— WW-r^VWV— 1
r
CJR,
r *.
HI-
R,C, = 0.4431
r,c, «ai*M
R,C, -0.S000
R,C. * HUM
R,Ct « 0.13*0
ALTtBNATIVCLV
(FOR DELAY CORRECTION)
0.2153 (FOR FIRST DERIVATIVE)
0.2 i5J (FOR DELAY CORRECTION)
Figure 13. Physical configuration of linear pre-
diction circuit for ground-control bombing com-
puter.
is the one given in (6), Section 12.2. It may
be expressed as
where
YAP)
Y,(P)
Y»(p)
1
1 + 0.2153p
1 + 0.2847p + 0.03870p»
1 + 0.135<Jp
1
1 + 0.135*)p
CONFIDENTIAL
CIRCUIT
I SING SERVOMECIIANISMS
133
It may be noted that a redundant factor has
been introduced, viz., 1 + 0.1359p, in order to
secure a physically realizable Y,(v) . The coeffi-
cient was chosen so that a resistance would not
be required in the shunt branch of the feedback
network. Referring to tin- circuit configura-
tion in Figure 13, the transmission function of
the input network is Y,s(p), that of the feed-
back network is Y,(p), and that of the output
network at the top is Y, ,(p) .
The output impedance of the amplifier is re-
duced nearly to zero by virtue of shunt feed-
back.1"^ Hence, the rate circuit, as shown in
Figure 13, may be derived from the amplifier
output through a simple additional network
whose transmission function is pY,,(p)- Two
rate outputs are provided so that the delay
introduced in position may be corrected option-
ally without disturbing scale factors.
CIRCUIT USING SERVOMECHAN1SMS
In the final report, October 25, 1945, to
NDRC Division 7, on the research program car-
ried on under Contract NDCrc-178, a list is
given of a number of the more important prac-
tical advantages for the use of a-c carrier in
computing circuits. These advantages are:
1. Permits operation at lower levels before
running into trouble with thermal noise, contact
potentials, drifts due to temperature;
2. Permits use of transformers for imped-
ance matching, voltage transformations, cou-
pling between balanced and unbalanced circuits ;
3. Permits use of hybrid coils for voltage
summations of moderate precision ;
4. Eliminates the necessity for modulators in
servo circuits using a-c motors ;
5. Permits reduction in total power consump-
tion, rectified power for amplifiers, and voltage
regulation.
However, the techniques of differentiation
and of data smoothing with fixed networks in
computing circuits which use d-c carrier, are
not applicable to computing circuits which use
a-c carrier.
The circuit described here is an example of
one of the techniques used in the T15-E1 experi-
mental curved flight director.' In Figure 14
servo motors' are indicated by A/, and genera-
' The technique of using servo motor* for smoothing,
as described above, is due chiefly to h L. Norton.
tors by G. The motors are two-phase induction
motors with one phase winding of each ener-
gized directly by the carrier source at constant
amplitude. The generators are essentially two-
phase induction motors also with one phase
winding of each energized directly by the carrier
source at constant amplitude. They deliver, at
Figure
circuit.
14. Electromechanical linear prediction
the other phase windings, carrier voltage at
amplitudes proportional to the angular velocities
0, and 0, of the shafts. The potentiometers are
energized by the carrier source at constant am-
plitude. They deliver carrier voltage at ampli-
tudes proportional to the angular positions 0,
and 6.2 of the shafts from some reference posi-
tions. The position data are represented by the
modulation amplitude E.
With amplifiers of sufficiently large voltage
gain and power capacity, and motors of suffi-
ciently large torque, the operational equations
of the circuit are readily found by equating to
zero the sum of the voltages applied to each
amplifier. Thus
0i + (a, + 0p)0, = E
p0i - (1 + a2p)0, = 0
whence
0i =
u2 =
1 + atp
l + + a„)p -(- 0pJ
E
1 -Mat + «s)p + /3pJ
The angular position 0l therefore represents
the smoothed position data while the angular
position 62 represents the smoothed rate.
CONFIDENTIAL
Chapter 14
VARIABLE AND NONLINEAR CIRCUITS
The past discussion has been more or less
clearly directed at predictor systems hav-
ing certain well-defined properties. For ex-
ample, it has been tacitly assumed that the first
part of the prediction system will consist of
geometrical manipulations transforming the
raw input data into other quantities, such as
the components of velocity in Cartesian or in-
trinsic coordinates, which we have some physi-
cal reason to believe should be approximately
constant for extended periods." These quanti-
ties, then, are isolated explicitly in the circuit
and are the actual effective inputs of the data-
smoothing networks. The data-smoothing net-
works themselves are, of course, definitely
assumed to be linear and invariable.
This is obviously a straightforward attack
but it does not necessarily exhaust all possibili-
ties. For example, advantages may be gained
by using data-smoothing networks which are
nonlinear or which vary with time or target
position. It may also be possible to smooth the
input data according to some geometric as-
sumption, such as straight line flight, without
the necessity of isolating geometrical parame-
ters explicitly.
This chapter attempts to illustrate these pos-
sibilities by some rather scattered examples.
Data-smoothing networks which vary with time
seem to give improved performance over fixed
networks, and have been studied with some
care. Several examples are given at the end of
the chapter. None of the other lines, however,
has been explored at all thoroughly. The ex-
amples of data-smoothing networks variable
with time are, in a sense, illustrations of non-
linearity also, since they all operate on the
assumption that the cycle of the network's
variation with time begins anew at each
marked change in course. Since a change in
course is exactly like a tracking error, except
that it is much larger, this resetting requires
a nonlinear control circuit which respond
to large amplitude effects but not to"small ones.
1 This is true ideally even in the Wiener system since
Wiener assumes that transformations will be made to
some suitable coordinate system, preferably the intrin-
sic, before the statistical prediction method is applied.
This, however, is evidently a very mild sort of
nonlinearity. More thoroughgoing nonlineari-
ties have not been studied. There seems to be
no a priori reason for supposing that they
would appreciably improve the performance
of data-smoothing networks.
The first part of the chapter gives examples
of data-smoothing schemes which do not re-
quire the isolation of geometrical parameters.
They are based on degenerative feedback cir-
cuits which satisfy the requisite formal rela-
tions but which might, in some cases, be un-
stable in practice. This portion of the material
is included primarily for its possible sugges-
tive value rather than for its concrete practical
usefulness.
>*•' THE PROTOTYPE FEEDBACK
CIRCUIT
The diversity of particular circuits can be
givon a certain unity by regarding them all as
modifications of the feedback smoothing cir-
cuit shown originally in Figure 2 of Chapter
10. In accordance with the discussion of that
figure it will be convenient to suppose that the
resistive feedback path is introduced to limit
the gain of the amplifier proper, so that the
structure reduces to an amplifier with high but
finite gain and a pure capacity feedback. The
circuit has a net loop gain, and is consequently
degenerative, at any moderately high frequency.
For our present purposes, it is convenient to
recall the general property of degenerative
feedback amplifiers, that they tend to suppress
any given frequency by the amount of the de-
generative feedback for that frequency. This
suppression obtains not only at the amplifier
output but at many other points in the circuit
as well. For example, it holds at the amplifier
input if we combine the original applied volt-
age with the voltage contributed by the feed-
back1- circuit1** Thus, except for the absolute
b This follows immediately from the fact that, since
the characteristics of the amplifier proper are not
changed by the addition of the feedback path, the
output voltage is always a fixed multiple of the net
input voltage.
134
CONFIDENTIAL
SIMULTANEOUS SMOOTHING IN THREE COORDINATES
135
signal level, it is not necessary to transmit
through the amplifier of Figure 2 of Chapter
10 in order to produce the smoothing effect. It
would be sufficient to hang the input circuit of
the amplifier, as a two-terminal impedance,
across the circuit.
142 SIMULTANEOUS SMOOTHING IN
THREE COORDINATES
The property of degenerative feedback cir-
cuits which has just been described is con-
veniently illustrated by a three-dimensional ex-
tension of the original smoothing circuit of
Figure 2 cf Chapter 10. The three-dimensional
circuit is shown in Figure 1. The three input
voltages are the quantities D, DE, and DA cos
i 'WW I
20k win
R
r W\rt
Vj-DE
V,«DAm»E
COORDINATE
1 COORDINATE
CONVERTER
CONVERTER
1 m '
MODULATORS
f m • • mm m mm^
:demodulators:
.....
I
r
Figure 1. Feedback smoothing in three coordinates
E, where D, E, and A are, respectively, slant
range, elevation, and azimuth. The three volt-
ages will be recognized as the three components
of the target motion in a tilted and rotating
rectangular coordinate system. One axis of the
tilted system is directed along the instan-
taneous line of sight to the target and the other
two are perpendicular to this one in the ver-
tical and horizontal planes respectively.0 It is
assumed that these input rates represent target
motion in a straight line, plus the usual track-
ing errors. The object of the smoothing system
is to provide shunt impedances which will tend
to suppress the tracking errors by feedback
action, according to the principles described in
the preceding section, without disturbing the
portions of the input voltages corresponding to
the assumed straight line path.
We can simplify the analysis by restricting
our attention to the special case of two-dimen-
sional motion which occurs when the target
course lies in a vertical plane passing directly
through the antiaircraft position. This is illus-
trated in Figure 2. In this case the component
DA cos E is evidently zero. If we represent
the voltage at the other two terminals, includ-
ing both the original applied voltages and the
voltages fed back through the circuit, by V, and
Vv the voltages coming out of the coordinate
converter on the right-hand side in Figure 2
are
v, « Vi cos E -Vt sin E
vw - Vt cos E + Vx sin E
(1)
These voltages are differentiated, passed
through a second coordinate converter, and fed
back so that the output voltages must satisfy
(2)
Vi = D — cos E + it sin E)
V, = DE - cos E - v, sin E) .
In order to exhibit the smoothing action of
the circuit let us denote the observed velocity
components, referred to the upright and fixed
0 This is the coordinate system which was used in the
experimental T15 director. A complete prediction cir-
cuit can be obtained by using- the three voltages de-
scribed here as inputs to the lead servos in the TIB
system. In the actual T16 system, rates in the tilted
and rotating coordinate system were obtained by the
so-called "memory point" method. The voltages D, DE,
-etc., required with the present method, might be ob-
tained with the help of tachometers attached to the
tracking shafts to measure the instantaneous values of
D, E, and A. An equivalent to the variable smoothing
of the memory point method can be obtained by *«»n«f
the gains in the feedback paths in Figure 1 variable
according to the principles described in a later
CONFIDENTIAL
136
VARIABLE AND NONLINEAR CIRCUITS
rectangular coordinate system, by ut and uw,
so that
ut = D cos E - DE sin E
u„ = DE cos E + D sin E .
Substituting (2) and (3) into (1), we get
(3)
Vy
Uy — fiVy
or
Ml'* + =
HVy + Vy = Uy .
These show clearly that vx and v„ are smoothed
values of u„ and uy, respectively. If n is constant
the smoothing is of fixed exponential type. If ^
is proportional to the time up to some maxi-
mum value, the smoothing is of the variable
type described in Sections 14.6 and 14.7.
To complete the discussion of the circuit we
observe that by (1)
Vi — rx cos E + vy sin E
Vt = Vy cos E — r« sin E .
These show that Vx and V, are the smoothed
rate components referred to the tilted and
rotating rectangular coordinate system. The
fact that the orientation of this coordinate sys-
tem, which depends upon the observed angular
height E, is not smoothed makes no difference
to the computation of the leads because this
computation is made instantaneously in the
same coordinate system to which the smoothed
rate components are instantaneously referred.
The analysis in the general case including
all three coordinates is of the same nature.
Since the rate components in fixed rectangular
coordinates appear in the middle of the feed-
back path, it is perhaps not fair to regard the
circuit as an illustration of a data-smoothing
device which does not rely upon the explicit
isolation of the geometrical parameters of the
assumed target path. It should be pointed out,
however, that in comparison with a straight-
forward geometrical solution in which velocity
components in fixed coordinates are first isolated
explicity, then smoothed, and then used to form
the basis of prediction, the circuit in Figure 1
has the advantage that most of the components
can be built with very low precision. What is
transmitted around the feedback loop is essen-
tially the tracking errors only. Since tracking
errors are always small, very high percentage
errors in the system can be tolerated.*
COO
CON
RDINATE
VERTER
J-l
! MODULATORS ',
c J
COORDINATE
I CONVERTER
'DEMODULATORS!
■Ir
Figure 2. Feedback smoothing in two coordinates.
SMOOTHING NETWORKS VARIABLE
WITH TARGET POSITION
It was mentioned earlier that changing the
data-smoothing network with the target coor-
dinates represented one way in which the re-
sults obtained from fixed networks could be
d An exception to this statement must be made for
errors in the coordinate converters which fluctuate
rapidly with target position.
CONFIDENTIAL
SMOOTHING NETWORKS VARIABLE WITH TARGET POSITION
137
generalized. In a sense, the coordinate conver-
sions of Figure 1 are illustrations of these
possibilities. A better illustration, howe.dr, is
provided by the circuit of Figure 3. Thv struc-
Figure 3 Feedback smoothing with smoothing
variable v. ; h pv^iioti coordinates.
ture is intends to give smooth slant range
rate from slant range lata, under the assump-
tion of unacceierated straight line target
motion.
The relation between input and output in
Figure 3 is readily seen to be •
'"at" -4 '»'•>]
or
M^(/)IJ + 1=^ (4)
where ^ is the amplifier gain, D is slant range,
and V = dD/dt is slant range rate.
The principle of the circuit depends upon the
fact that under the assumed target motion the
square of the slant range, D2, should be a
quadratic function of time, so that [D (dD/dt)]
should be a linear function of time and (d/dt)
[D (dD/dt)] should be a constant. This last is
the quantity which is fed back in Figure 3.
If it actually is a constant, it has no further
influence on the calculation, since the forward
circuit includes a differentiator, and the opera-
tion of the circuit is the same as though no
feedback term were present. This can be verified
by setting D = D0 = \/a + 2bt + ct\ corre-
sponding to ideal straight line flight, in equa-
tion (4). It is readily seen that the equation is
satisfied by
ft + <* dl)0
V = To =
Va + 2bl -r Ct*
(It
the first or feedback term being zero.
If D does not correspond exactly to straight
line Alight, either because of tracking errors
or actual target maneuvers, on the other hand,
the feedback voltage is no longer constant. In
this case transmission around the loop can
exist and the degenerative feedback action
produces smoothing in both the input and the
output voltage. In calculating the exact effect
we must take account of the fact that the feed-
back voltage depends upon the D potentiometer
in the feedback circuit as well as upon the out-
put voltage V. Since the D potentiometer set-
ting must include the errors in the input data,
this means that the output voltage is not per-
fectly smoothed, even with unlimited gain
around the loop. The percentage error in the
output rate tends in the limit to approximate
the percentage error in D itself. For practical
purposes, however, this is a very satisfactory
result, since in the absence of smoothing per-
centage errors in rates are usually many times
those of the corresponding coordinates.
It is apparent that it should be possible to
construct many circuits of this general type
from the differential equations of the trajec-
tory. A second example is furnished by Figure
4. The operation of the circuit is essentially
• • DAcosE
_
•The condensers in Figure 3 symbolize differentia-
tion.
Figure 4. Another example of feedback smooth-
ing with smoothing variable with position coordi-
nates.
similar to that of Figure 3. It depends upon
the fact that in unaccelerated straight line
motion the quantity D2A cos2 £ is a constant.
Instead of multiplying by D2 and cos2 £ at a
single point in the feedback loop, however,
separate multiplications by D and cos E are
introduced in the forward and feedback cir-
cuits. This permits the output to appear as a
smoothed value of the quantity DA cos E,
CONFIDENTIAL
138
which will be recalled as one of the primary
quantities in the circuit of Figure 1.
14-« NETWORKS VARIABLE WITH TIME
In addition to making the parameters of the
data-smoothing network vary as functions of
the coordinates of target position we may also
make them variable as functions of time. The
advantage of variation with time can be under-
stood by going back to the discussion of the
analytic arc assumption and its consequences
for fixed data-smoothing networks, as given in
Chapters 9, 10, and 11. It will be recalled that
for any given settling time there was an opti-
mum choice of the network's weighting func-
tion. The choice of the settling time itself, how-
ever, was always a compromise. On the one
hand, making the settling time too short led
to too little smoothing, so that the dispersion
in the resulting fire became excessive. On the
other hand, too long a settling time meant that
data from previous unrelated segments were
retained in the smoothing circuit during too
large a proportion of an average individual seg-
ment of the target path, leaving too small a
residue of the average segment as useful firing
time.
It is evident that it is theoretically possible
to escape the consequences of this compromise
by resorting to variable structures. We need
merely assume that the network always has a
weighting function appropriate for a settling
time equal to the time since the last change in
course. This would give a small amount of
smoothing shortly after a change in course,
with more smoothing and consequently greater
accuracy later on. No firing time, however, is
sacrificed waiting for the network to settle.
In order to exploit these possibilities we
must, of course, be able to design networks to
give at least approximately the right sequence
of weighting function. It is also necessary to
provide some sort of auxiliary controlling
mechanism which will sense changes in target
course and return the variable circuits in the
smoothing network proper to their initial posi-
tions. These are both difficult problems which
.iave been incompletely explored. Some elemen-
tary solutions, based principally upon modifica-
tions of the degenerative feedback smoothing
circuit of Figure 2, of Chapter 10, are, how-
ever, given later in the chapter. As a prelimi-
nary, the next section gives a formal extension
of the general polynomial expansion method of
Chapter 11 to the variable case.
»*s GENERAL POLYNOMIAL SOLUTION
FOR VARIABLE NETWORKS
The extension of the general method of
Chapter 11 to the variable case requires two
modifications.
1. The lower limit of the integral to be
minimized is now taken as zero, in anticipation
of the possibility of discriminating between rele-
vant and irrelevant data on the basis of time of
arrival.
2. The weighting function may now depend
more generally upon the variable of integration
and the upper limit of integration.
With these modifications there is no longer
any advantage, in conducting the analysis in
terms of the age variable t. To deal directly
with the minimization of the integral
jf \E(\) - ig(X)}« B'o(/,X) rfX , (5)
let
E(\) = Vo + Vi- G,«,X) + • • • + Vm • Gn(t,\), (6)
Where Gm(t,k) is an mth degree polynomial in
A. Also, let
£ w0(t,\) d\ = i
jf G,(/,X) ■ Gm(t,\) ■ W0(t,\) d\ = 0 if I * m
" T. in = m
(Go = 1, Ar0 = 1) .
Then (5) is a minimum with respect to the
Vm's in (6) if
Vm(t) =J^lE(\)-Wm(t,\)d\ (7)
where
Wm(i,\) = kmGm(t,\) • W0(t,\) . (8)
The possibility of physically realizing the
Vm(t) depends upon the possibility of realizing
networks with impulsive admittances Wm(t^)
in the sense that Wm{t,k) is the response of a
CONFIDENTIAL
NETWORKS WITH A LIMITED RANGE OF VARIATION
139
network, at time t, to a unit impulse applied at
time A, where 0 < A < t. Taking this possibility
for granted, the predicted value E(t + t,) is,
according to (6), a variable linear
of the Vm{t), viz.,
Kit + t/)
(9)
Wit) + d(M + ii) ■ Vv(i) + ■
+ Gn(t,t + y • v.(t).
It is clear that all of the Wm(t,\) as well as
all of the Gm(t,\) for m = 1, 2, . . . are deter-
mined by W0(t,\). The latter is determined as
the best weighting function for position data
smoothing, depending upon the characteristics
of the noise associated with the position data.
The general methods of determining the best
weighting function with fixed smoothing time,
described in Chapter 10, may be used to deter-
mine the best weighting function with variable
smoothing time.
Under the assumption that the spectrum of
the noise associated with the signal 5(0 has a
uniform slope of 6k do per octave, we may take
over from Section 11.3 the result that the best
weighting function is
-«-JW![i(l<-W (,0)
0 £ X £ I .
The response of the network is then
£
S(X) • wk{t,\) rfX
(ID
SPECIAL
It will be illuminating to consider a few
special cases of (11).
For k = 0, we have
V(D = | jfs(X)dX.
(12)
Multiplying through by t and differentiating
we get
tV(t) + V(t) = 5(0 . (13)
This suggests the circuit shown in Figure 5.f
For k = 1, we have
V(t)
t* Jo
S(X) • \(t - X) rfX .
Multiplying through by t3 and differentiating
twice we get
Irv + IV + V = S
which may be written in the form
This suggests the network shown in Figure 6.«
14.7
NETWORKS WITH A LIMITED
RANGE OF VARIATION
By generalizing the above results in various
ways a large number of other examples of
variable smoothing networks can be constructed.
Since unlimited variation in the smoothing
time is not practically possible, or perhaps even
tactically optimal, however, it is desirable in
discussing any further examples to include also
the possibility that the range of variation in
the network may be restricted. For any posi-
tive integral value of k in (11) the differential
equation for V(t) is of the type which may be
reduced by the transformation t = e* to a linear
differential equation with constant coefficients.11
In general, this facilitates the determination of
what happens to the weighting function
wk(t,A) when t > T if the variability of the
network is stopped at time T. In the case of the
first-order equation (13), however, it is just
as easy to deal directly in terms of the natural
time.
A more general form for (13), which readily
yields the effects of a sudden or gradual stop-
page of the variability of the network, is
«(0
V(t) + V(t) = 5(0
(14)
This corresponds to the response
whence the weighting function is
w(t,\) =
»(X)
*(0
(15)
' This circuit is due to S. Darlington.
« Due to B. T. Weber.
"See Section A.ll for a more, general transforma-
tion.
CONFIDENTIAL
140
VARIABLE AND NONLINEAR CIRCUITS
The general relation (14) may be realized
with the network of Figure 5, by varying the
resistance in accordance with
R m 1<K0
t > 0 .
However, a more practical circuit results from
the introduction of variable potentiometers' in
both the capacity and resistance paths of the
C=4= V(t)
Figure 5. Time-variable smoothing circuit giving
uniform weighting function.
original feedback smoothing circuit of Figure
2, Chapter 10. This is shown in Figure 7.' It
may be noted that the feedback circuit is also
applicable to the two cases discussed in the
preceding section. It has the advantage for
these applications that it does not require the
zero-impedance generators and infinite-imped-
ance loads of Figures 5 and 6.
This example obviously calls for a linear poten-
tiometer in the condenser path and a switch in
the resistance path. The weighting function ob-
tained is, by (15),
u>(*,"X) - - 0 < \ < t < T
j, e-^/r o < X < T < t
1 e-«-wr 0 < T < X < t
Figure 7. Limited range time-variable feedback
smoothing circuit.
S(1)A C,
D ,J_
C,=J= V(t)
I
Figure €. Time-variable smoothing circuit giv-
ing parabolic weighting function.
As an example of (14) we may take
*(0 = t 0 < t < T
= re"-™ t > T .
Then
J(0 =/ 0<t<T
= T t > T .
Hence, in Figure 7, if RC = T
fc(t) = j, fa(t) =0 0 <t < T
= 1 = 1 t > T .
1 In aome cases a variable potentiometer may turn
out to be a switch.
J This circuit is due to S. Darlington.
This is illustrated in Figure 8 for T= 10, t = 5,
10, 20.
0.2
0.1
t = 5
t = IO
T=I0
t=20
10 15 20
Figure 8. First example of weighting function
produced by circuit of Figure 7.
A second example is furnished by taking
<t>(t) = ik 0 < t < T
= 7*e*«-T>/T t > T .
Then
ko
k 0 < 1 < T
T
CONFIDENTIAL
OTHER EXAMPLES
141
Hence in Figure 7, if RC T k.
The weighting function obtained is, by (15),
frit) = T fud) = 1 lk (i < i . T
= 1
1 i > T
wCt,\) =
2T
The first example is a special case of this one.
The weighting function obtained is, by (15),
AX*-1
u»(/,x) = — -j— o < x < / < r
■ c -*('-r)/r o < x < t < /
= ^ e -*('-M/r o < T < X < / .
This is illustrated in Figure 9 for k - 3/2,
71 - 10, t = 5, 10, 20.
0 < X < * < 7
271
2
7 xV e"2l'"T) T 0 < x < T < 1
V ~2f)
e-2(i-y)/T 0 < T < \ < t .
This is illustrated in Figure 10 for T = 10,
t = 5, 10, 20.
k = i T=I0
Figure 9. Second example of weighting function
produced by circuit of Figure 7.
A third example is furnished by taking
2-1
0 < / < T
TV *«-T) r , > 7'
Figure 10. Third example of weighting function
produced by circuit of Figure 7.
A fourth example is furnished by taking
4><t) - c* - 1 < > 0 .
Then
l
57, i>o.
Hence, in Figure 7, if f?C = 1/k,
fc(t) = /*(0 = 1 - e~kt t> 0 .
The weighting function obtained is, by (15),
k
Then
w(t,\) =
1 - e
-kl
e-*d-x) o < X < t
<t>a) \ 2/7
For any value of t this weighting function is
exponential in x.
T
14.8
OTHER EXAMPLES
Hence, in Figure 7, if RC - 7/2,
/r(fl = |(l ^) /*(» = -,{. 0 < / < T
= 1 = 1 / > T .
CONFIDENTIAL
Because there has been no demand for varia-
ble networks in the field of communications,
the technique of designing practical variable
networks is in a very rudimentary stage com-
pared to that of designing fixed networks. In
the remainder of this chapter we shall describe
VARIABLE AND NONLINEAR CIRCUITS
U2
some of the circuits which have been developed
for specific practical applications.
A memory point method of obtaining
smoothed rates, based upon (12), is illustrated
below. If S(t), the quantity to be smoothed,
lepresents the time derivative E(t) of the posi-
tion data E(t), then the average rate is given
by
Coder the assumption that the position data,
aside from tracking errors, is a linear function
of time, the average rate is also the smoothed
rate. If the position data is represented by the
angular displacement of a shaft in the com-
puter, the quantity £"(0) is readily fixed by
providing a second shaft which is coupled to
the first shaft until t - 0 when the coupling is
broken. Potentiometers mounted on the shafts
are energized by a voltage varying as a func-
tion of time in the manner indicated in Figure
11. The manner in which the smoothed rate is
obtained is clear
Fibi'iit 11. Memory point method of obtaining
smoothed rate.
The memory point method of obtaining
iuothed rates is used in the T15 antiaircraft
director.4 In this application, however, it is
somewhat more complicated than in the simple
illustration described above. This is due to the
fact that the position data and the memory
point are in the polar coordinate system,
whereas the rate components are referred to
a tilted and rotating rectangular coordinate
system which is determined by the instanta-
neous llllr of sight
Figure 12, shows a way of securing variable
smoothing in a purely electrical circuit * Except
for the fact that the division of the current
through the condensers is varied discontinu-
FiGURE 12. Specific limited range time-variable
feedback smoothing circuit.
ously instead of continuously, this circuit cor-
responds to the first or the second example dis-
cussed in Section 14.7.
Figure 13 shows the variable smoothing cir-
cuit 1 for smoothing first derivatives in the
M9A1-E1 antiaircraft director.8 This circuit
R
Figure IS. Another specific limited range time-
variable feedback smoothing circuit.
corresponds approximately to the second exam-
ple of the differential equation (14) given
above. The variable element is a thermistor
which is heated up to a high temperature, prac-
tically instantaneously, by the heater, and then
k This circuit is due to S. Darlington.
1 Developed by R. F. Wick.
CONFIDENTIAL
OTHER EXAMPLES
143
allowed to cool off naturally. By choosing the
electrical and thermal constants in the circuit
correctly the resulting smoothing can be made
to approximate that obtained in a memory
point circuit.
As noted earlier, all these variable circuits
require some auxiliary control means to reset
the variable circuits to zero whenever a new
target is engaged or the current target makes
a sudden change in course. In the T15 memory
point system this function was performed by an
operator. The operator was aided by a series of
meters which compared the instantaneous
memory point rates with average rates set in
some time previously by hand. The visual in-
dication of a change in course, calling for the
selection of a new memory point, was a rela-
tively large, smoothly and decisively varying
deflection on the meters. In contrast, normal
tracking errors appeared as relatively small
random fluctuations of the needles. The circuits
of Figures 7 and 12, which were intended for
bombsight applications, were also under the
control of an operator, who was supposed to
start the mechanism at the beginning of each
bombing run.
Two control methods were used for the cir-
cuit of Figure 13. In one, large changes in rate,
corresponding to probable changes in target
course, were distinguished by comparing the
instantaneous value of the target rate, as ob-
tained directly from a differentiator, with the
smoothed value obtained at the output of the
smoothing circuit. In the other method, equiva-
lent information was obtained by again differ-
entiating the instantaneous value of the target
rate, making a second derivative of the target
coordinate. In either case this rate difference
or second derivative information was used to
control a gas tube, which went off, supplying
heating current to the variable thermistor,
whenever the voltage applied to it exceeded a
certain threshold. This threshold evidently
marks the minimum change in course for which
the variable network will be reset. In order to
permit the use of a low threshold, without
making the circuit unduly liable to false opera-
tion because of the effect of tracking errors,
the gas tube input voltage was first transmitted
through a low-pass filter which suppressed
most of the energy due to tracking errors. A
considerable amount of work was done on the
proportioning of this filter to provide the best
protection against false operation with a low
threshold and with minimum delay in resetting
in case a change of course actually does occur,
but the problem remains an interesting subject
for research.
APPENDIX A
NETWORK THEORY
THIS APPENDIX GIVES a summary of linear
network theory which is pertinent to the
analysis and design of data-smoothing and
prediction circuits. It is incomplete in many
respects and should therefore be supplemented
by reference to established textbooks on the
subject. However, it contains some results
which are new.
The present summary will be concerned
mainly with fixed linear networks. Variable
linear networks will be considered briefly in
the last section.
A 1 IMPULSIVE ADMITTANCE
A fixed linear transmission network is one in
which the response V(t) is related to the im-
pressed signal E(t) by a linear differential
equation of the form
b'dW+bn-idJiy^ + + M'
dmE dm'lE
with constant coefficients. It is well-known that
the solutions of such a differential equation
obey the "superposition principle." This makes
it possible to formulate the response of the net-
work to any signal, in terms of its response to
certain standard signals.
A convenient standard signal for analytical
purposes is the "unit impulse." It may be re-
garded as the limit of the rectangular pulse
shown in Figure 1 as the duration of the pulse
» i 1
Figure 1. Rectangular puise signal.
is decreased indefinitely while the amplitude is
increased in such a way that the area under
the pulse is always unity. The limiting function
thus denned does not exist in a strict mathe-
matical sense. However, it is very convenient
for analytical purposes, and seldom leads to
difficulties, to proceed as though the limiting
function did exist. An impulse occurring at
t = a is conventionally denoted by the singular
function Su(t — A) where
«o(t) = 0 if r ^ 0
J ha(r)dr =0 if t < 0
si if t> 0
The response of a fixed network to an im-
pulse or any form of signal is independent of
the time at which the signal is applied, provided
it is expressed as a function of the time relative
to the application of the signal. Let W(t) be
the response to the signal &0(t). This is called
the "impulsive admittance" of the network.
Physically, it must be identically zero for nega-
tive values of t. For an impulse applied at t = A
the response will therefore be W(t — A), which
is identically zero for t < A.
A physical signal E(t) such as the one shown
in Figure 2 may be resolved into an infinite
Figure 2. Derivation of superposition theorem.
succession of elementary impulses. The strength
of the typical elementary impulsive component,
such as the one shown in Figure 2 as occurring
at time A, is E(\)d\. Its contribution to the
response at time t is E(\)-W(t — A) dk. Hence
the contribution of all the elementary impulsive
components of the signal, to the response at
time t, is given by the formula"
V{t) = f + E{\) ■ W(t - A)d\ (2)
This is one form of the "superposition theo-
rem" for fixed linear networks.
Before discussing the reasons for the limits
of integration indicated in (2), it will be help-
ful to consider a graphical interpretation other
than the one used in deriving the integral. Let
W(t) be of the form shown in Figure 3, and let
^(A) be of the form shown in Figure 4. To
determine the response V(t) at a given value
of t, the curve in Figure 3 is turned over from
CONFIDENTIAL
145
146
APPENDIX A
right to left and placed over the curve in Fig-
ure 4 so that its right-hand edge is at A - t. The
product of the two curves gives a third curve
(not shown), which is identically zero for all
. > t. The area under the third curve is the re-
I — L-W(t)
FlGl'RE 3. An
impulsive admittance
sponse V(t) at the given value of t. For pro-
gressively larger values of t, the curve repre-
senting W(t — a) in Figure 4 is simply slid to
the right with respect to the curve represent-
ing E (a) .
LOO
-i C I 1 ? 3
f'ieu* 4. Graphical iiiterpif iaUon
turn theoiem
ismee a physical signal must certainly be
identically zero up to some definite time, or
since it must certainly have been applied to the
network at some definite time, that time could
be taken arbitrarily as Zero and (2) could be
written in the form
V® = f
Jo
Elk)
In this form, however, since
A!rfA
(3)
jo
is in general a function of t, the response cou.d
not Oe interpreted as a weighted average of the
signal. On the other hand, since
j ^ H',/ - Ax/A = jT W\r)d7
is independent of t, the response may be inter-
preted as a weighted average of the signal, if
•/, - 1
1 h:
as
-ce.->sity of taking tiie lower limit in f2i
j in order t" permit the interpretation
of the response as a weighted average of the
signal, is also expressed by the pi»iu1 of view
that a hxed network cannot make any ,/n/sical
distinction between having no applud signal
and having an applied signal which happens to
be of zero amplitude.
Another shortcoming of the form i'Ai or, for
that matter, of the form (2) if we set t as the
upper limit of integration, comes from the con-
sideration of impulsive admittances of such a
nature that Wit - A) has certain kinds of sin-
gularities at a — t. For example, the case for
direct transmission, expressed in the form
...
VU)
/; >
(A* • S0(t - A),7A
is ambiguous because the singularity in the
integrand occurs exactly at one end of the
range of integration. However, the form
./;'
A I • bn't — Av/A
leads, without ambiguity, to the result
V (t) -- E(f) . This example is not trivia!. Every
network which transmits infinite frequency
must have an impulsive admittance of such a
nature that WU \) contains a singularity of
the I'm n, &,.(' a). Any attempt to rule out such
a singularity on the ground that physical net-
works cannot in fact transmit infinite fre-
quency, complicates the analysis and design of
networks unduly. If a network is capable of,
or is expected to transmit frequencies at the
top of the range of interest or importance, it is
simpler to assume that the network is capable
of, or is expected to transmit all frequencies
above that range.
One other advantage of taking the limit
s of
integration as indicated in (2) may be called
to attention Keeping in mind that /-.'(a) is
identically zero for all values of A below some
definite though perhaps unknown value, and
that Wit ai is identically ,tro for all values
of a t, it is viear that (2) may be integrated
partially any number of times without incur-
ring the burden of carrying a string of iff ins
outside of the integral. Af?«r one pamai inte-
gration we have
where
I'/)
.1 ;/
Sine £ i a, ..< identic. !:> . ],„ ai. ,.,:„,.. 0f
.-. in vM-.n-h Eix) > :ienti«all> zer. ...itd *inee
LONHDL.Ml \1
APPENDIX A 147
A(t - A) is identically zero for all values of
A > t, a second partial integration may be per-
formed with no more formal complication than
the first partial integration. The fact of the
matter is that the terms which ordinarily arise
in partial integrations, outside of the integral,
are here carried under the integral by singulari-
ties of the integrand.
The superposition theorem in the i^rm (4)
may be derived directly in a manner similar to
the derivation of (2). A(t - i) is the response
of the network to a Heav; ..e unit step func-
tion H(t — a) applied at t A, where
H(1 - X) m 0 when t < X
= 1 when t > A .
The signal is resolved into an infinite succes-
sion of elementary step functions of amplitude
E'{k)dk wherever E(k) is continuous, and
finite step functions of amplitude dE(k) wher-
ever £"(a) has a finite discontinuity. The con-
tribution of each elementary step function to the
response at time t is E' (k) A(t — k)dk, that
of each finite step function is A (t - A) • dE(k).
Hence, the response is given formally by (4)
with the understanding that E'(k)dk is to be
interpreted as dE(k) wherever E(k) is discon-
tinuous.*
The response A (t) of the network to a
Heaviside unit step function H(t) applied at
t — 0 is called the "indicial admittance" of the
network. It is more familiar, in the field of
linear transmission theory, than the impulsive
admittance to which it is related by (5), but in
this monograph preference is given to the use
of the impulsive admittance. In the theory of
linear differential equations the impulsive ad-
mittance is known as a Green's function.
It is often convenient to express the response
so that the variable of integration represents
the age of the elementary components of the
signal. Introducing the age variable
r = t- A (0)
into (2), we have
F(0 = £*FAt-T) ■ W(r)dr. (7)
•Formula (4) may be written in the Stieltjes form
V(t)= I A(t-\)aE(\).
Alternatively, we may take the point of view that
E'(A) contains impulsive singularities wherever E(\)
is discontinuous. This point of view is generalized in
Appendix B.
In this form it is clear that the weighting of
signal components is on the basis of age only.
A fixed network may be said to have a memory
which is a function only of the age of past
events.
In the preliminary stages of designing a
smoothing network, the weighting function
W(T) is generally prescribed to be identically
zero when t > T say, as well as when t < 0.
This does not violate the conditions of physical
readability. However, such a weighting func-
tion cannot be obtained exactly with a network
of a finite number of discrete impedance ele-
ments. A finite network invariably yields a
weighting function with a "tail" which extends
to infinity.
*•« TRANSMISSION FUNCTION
Theoretically, the impulsive admittance of a
prescribed network may be determined directly
from the differential equations of the network
in a perfectly straightforward manner. Prac-
tically, however, it is very difficult to do so if
the network has more than two meshes. Fur-
thermore, the technical problem of designing
a network directly from a prescribed impulsive
admittance is even more difficult, particularly
if the impulsive admittance is not exactly re-
alizable.
These difficulties may be avoided by recourse
to the highly developed methods of network
analysis and synthesis used in the field of com-
munication circuits. These methods are based
upon the steady-state properties of networks.
If a signal consisting of the single sinusoid
cos <i>£ is applied to an invariable or fixed
linear transmission network, the steady-state re-
sponse" will also be a single sinusoid of the
same frequency. The amplitude and phase of
the response, relative to the signal, will in
general depend upon the frequency. The re-
sponse may be regarded as the resultant of an
"inphase component" proportional to cos o>£,
and a "quadrature component" proportional to
sin U, with amplitude coefficients which are
functions of the frequency. Furthermore, since
the signal is an even function of the frequency,
the response should also be an even function
of the frequency.0 Hence, the response will
" This is the response apart from transient compo-
nents, assuming that the latter vanish exponentially
with time after the signal is impressed.
c The signal is also an even function of the time but
this is due only to the particular choice of origin which
is arbitrary.
CONFIDENTIAL
148
APPENDIX A
be of the form G(w2) cos wt — wH(w2) sin wt,
where G and H are even real functions of fre-
quency.
By a suitable shift of the origin of time it
follows that if the impressed signal is sin wt,
the steady-state response will be of the form
G(w2) sin^f + o)H(oj') cos wt.
These two results may be combined into a
simpler expression without any loss of indi-
viduality. Since eiu>t - cos wt + i sin wt where
i = \/ — 1, we have
V(t) = '[<?(»*) -(- iuH(u')} ■ if E(l) = e".
A further simplification may be achieved by re-
placing iw by p, and G( - p2) + pH{- p2) by
Y{p), so that
V(f) = Yip) ■ e" if E{t) = e* . (8)
Y (p) is called the "steady-state transmission
function" or just "transmission function" for
short.
Strictly speaking, (8) expresses the relation
of steady-state response to signal only if p = u>.
However, it is customarily called a steady-state
relation even when p is not a pure imaginary
quantity. It may be noted that Y(p) is real
when p is real.
The simplicity of steady-state analysis de-
rives from the fact that time occurs in the
signal and throughout the network only in the
form ept. In particular, the determination of
the transmission function is reduced to the
solution of simultaneous algebraic equations
which do not involve the time factor. For a net-
work in which the signal and the response are
related by the linear differential equation (1)
with constant coefficients, we obtain simply
KV 6o + 6,p + • • ■ + f>„pB '
It may be noted that the poles of the transmis-
sion function, also referred to as "infinite-gain
points" in the p-plane, correspond to the roots
of the characteristic function of the differential
equation. Physical restrictions on the location
of infinite-gain points will be considered in Sec-
tion A.9.
AJ RELATIONSHIP BETWEEN
IMPULSIVE ADMITTANCE AND
TRANSMISSION FUNCTION
A relationship between the impulsive admit-
tance and the transmission function of a net-
work may be obtained from (7). Putting
E(t) = e" when t > 0, we get
V(t) = ePtJ^'w(T^ e'*1 dT
= e"jT W(t) e~* dr
W(t) e-» dr
(9)
The second term in (9) is a transient term due
to the fact that we have taken E{t) ==0 when
t < 0. The first term in (9), which involves the
time only through e"', is the steady-state term.
Comparing this term with (8) we get
Y(p)
W(t) e~" dt
(10)
or, in the notation which will be introduced in
the next section
A.4
Y(p) = L[W{t)\ .
LAPLACE AND INVERSE LAPLACE
TRANSFORMS
(ID
The frequent use which is made of the
Laplace transform and its inverse, in the
analysis and design of fixed linear networks,
warrants a brief discussion of these trans-
forms.
Given a function f(t) which is identically
zero when t < 0, its Laplace transform g (p) is
defined by the formula
g(p) = Hf(t)]
f(t) e-" dt
(12)
This is usually written with 0 for the lower
limit, but by having the point t = 0 inside the
range of integration, instead of at the end, we
secure the same advantages for (12) that we
gained in the case of (2) by having the point
k = t inside the range of integration. Since f(t)
is identically zero when K0 we could write
— oo for the lower limit in (12) , but this would
run the risk of confusion with the so-called
"bilateral Laplace transform." On the whole,
it is worth while to have a constant reminder
that functions f(t) which are not identically
zero when t < 0 are ruled out.
The integral in (12) is usually not con-
vergent for all values of p. That is, in order to
secure convergence of the integral, it may be
necessary to assume R(p) >a, where R(p) is
the real part of p, and a is a real number. The
CONFIDENTIAL
APPENDIX A
149
result of the integration is a representation of
g(p) in the half-plane R(p) > a. Since the
representation is analytic throughout the half-
plane, the principle of analytic continuation
allows us to extend the definition of g(p) to
the remainder of the /;-plane.
Given a function g{p) which is analytic
throughout the half-plane R(p) > c where c is
a real number, its inverse Laplace transform
/(f) is given by the formula
f{t) = L-'[ff(p)]
] fc+ia
<j{p) €*< dp (13)
provided /(f) is identically zero when t < 0.
If the result of the integration in (13) is not
identically zero when t < 0, g(p) is not a
Laplace transform and the application of the
inverse transformation to it is meaningless.
Translation Theorem
A useful theorem can be established at this
point. This is the translation theorem.
If
G{p) = L[F(t)~\
then
L->[G(p)e ^ = F(t - a)
provided that F (f — a) =s 0 when t < 0. Trans-
lation is to the right or left according as a is
— ™
positive or negative.
If it happens that F(f)==0 when t < t0
where f0 > 0, then the restriction is that
a> — t0. That is, a limited amount of transla-
tion to the left is permissible. In general, f0 = 0
and the restriction is therefore that a > 0. This
theorem follows readily from (12) or (13).
In all of the applications of (13) which we
have any occasion to make in the analysis and
design of fixed linear networks, the function
g(p) may be resolved into a sum of terms of
the form G(p)e-pa where a > 0 and G(p) is a
rational algebraic function with real coeffi-
cients. Making use of the translation theorem,
the problem of evaluating L1 [g (p) ] reduces to
that of evaluating L-'[G(p)]. Now, G(p) may
be resolved into a sum of terms of the form
p" or l/(p — a)m+1 where m = 0, 1, 2 - ••. We
shall consider these two cases separately.
The case G (p) = p" will be treated by means
of (12) and some limiting processes. In Sec-
tion A.l the unit impulse was regarded as the
limit of a rectangular pulse of duration T and
amplitude 1/7. By means of (12) the Laplace
transform of such a
0 < f < T is
over the interval
1 - tr*
pT
Hence
L [£,(()] = lim 1 - e->T _
T-*0 pf - 1 •
Formally therefore
L-> [1] = 1,(0 (14)
Similarly, the Laplace transform of a pulse
over the interval a < t < a + T where a > 0 is
1 -c-"r
pT
Hence
L[60(t-a)}
lim 1 - e-"r
Formally therefore
L-i [e-~] = &0(t~a) .
The last result follows directly from (14) using
the translation theorem.
Next, let
r-*o ji
This is the limiting case, as shown in Figure 5,
of two impulses of strengths 1/T and -1/T
separated by a time interval T. It may be called
T
-t
V-ipCt-T/T
Figure 5. An impulse doublet.
an impulse of second order. By (12) and the
previous results
L [1,(0] - Km 1 -«-"', -
r-»o f v •
Formally therefore
L~l [p] - «,«) .
(15)
Proceeding in this fashion we may define an
impulse of (m + l)th order as
Ut) = lim <— .«) - «— i (t-T)
T-*0
(16^
CONFIDENTIAL
150
APPENDIX A
and we may then show that
MM')] = r.
Formally therefore
L~l [jr] « a.(0
then
(17)
This disposes of the case G(p) = pm where
m — 0, 1, 2 • • • .
The case G(p) = 1/ (p - a) "*l will be treated
by means of (13) and Jordan's lemma.
Jordan's Lemma
If all the singularities of G(p) can be en-
closed by a circle of finite radius with center at
the origin, and if G (p) -*0 uniformly with
respect to arg z as \z\ -> oo, then
G(p)e*dp] - 0
where r is a semicircle oi radius P, with center
at the origin, to the right of the imaginary axis
if t is negative, to the left of the imaginary axis
if t is positive.
By the use of this lemma the contour of inte-
gration in (13) may be closed and the integra-
tion may then be performed by the method of
residues. In the case
lira
<?(P)
(p - a)-+l
we readily obtain
where m — 0, 1, 2
[(p - a)-+>]
t < 0
ml
(18)
/ > 0.
An important special case of (18), correspond-
ing to o = 0, is
J Lp"+1J m!
< > 0
(19)
Another useful theorem which is readily
established by means of (12) and (13) is
Borel's theorem.
Borel's Theorem
If 0(P), 9Av), 9ii.P) are the Laplace trans-
forms of f(t)t /,(«), /,(*), respectively, and if
g(p) - 0i(p) 0t(p)
m - " x) /,(x)dx
- £jx{T)-S*{t-r)dr.
The functions /, (O and ft(t) are subject to
conditions which permit the inversion of the
order of integration in the following proof.
However, these conditions are seldom of any
concern. We have
ftfl = L-l{0i(p) • L [/»(*)]}
Inverting the order of integration and noting
that
2x1 Jc-i<r>
gi(p)tp(,~x) dp
0 if X > t
f(t - X) if X < <
we obtain the result stated in the theorem.
*•» ALTERNATIVE EXPRESSION OF THE
RESPONSE-TO-SIGNAL RELATIONSHIP
The result (8) obtained in Section A.2 sug-
gests an operational expression of the form
V® = Y(p) ■ E® (20)
for the response-to-signal relationship what-
ever the signal E{t) might be. If the equiva-
lence of this operational expression to (2) it
taken as a matter of definition we may readily
discover the nature of the implied operation.
In the light of Borel's theorem, (2) may be
expressed in the form
L[V(t)} = L\W(»] • L\EW]
under the permissible assumption that £(t)«0
when t < 0. Hence
V(#) = lrx [LflPOl ■ L{E(t))\
or, by (11)
V(0 = L~l \ Y(p) ■ L[E(t)]\ . (21)
This is, therefore, in general the meaning of
the operational expression (20) .4
o We note that if S(p) = L\E(t)\, the operational
V(t) ~ S(p) ■ W{t)
U equivalent to (20). Thii form ia need in Section 104
and in Appendix B.
CONFIDENTIAL
J52
APPENDIX A
The symmetry of the impulsive admittance
is expressed by
W(T - t) = W(t)
Since W(t) =0 when t < 0, it must be so also
when t > T. Hence
' W{t)e~*dt + / W(t)e~*dt.
By a change of variable of integration the sec-
ond term may be expressed in the form
W(T -t)e-*T-»dt
Assume that W(t) admits the series expan-
sion
Wit) = a0 + A,t + ... +4;r + ••• • <25)
771 ,
r
or, because of the sj
Xr/i
W(Qe* dt .
Hence, if the first term in Y(p) be
W(t)e-* dt
we have
Y(p) = Yy(p) + Yi{-p)er+*
= [iri(p)epT/2 + Ki(-p)e-pT/2] tr*Tn .
At real frequencies (p = u>) the bracketed fac-
tor is evidently an even real function of
Hence
Y(tu)
• e-u*r/I.
(24)
Apart from discontinuities in the phase angle
of the transmission function at real frequencies
» for which QU2) is zero, the phase angle is
proportional to frequency. Such a transmission
function is referred to as a linear phase trans-
mission function. Sinusoidal components of the
signal, of frequencies less than the lowest fre-
quency at which Q (<uJ) vanishes, suffer phase
retardations in transmission in proportion to
their frequencies. These components therefore
contribute no delay distortion. They are delayed
by a uniform amount, just as they are in a
properly terminated distortionless, uniform
transmission line, although in the case of (24)
they contribute amplitude or loss distortion
through Qiw2). The delay in (24) is just half
of the "smoothing time" T.
SERIES RELATIONSHIPS BETWEEN
IMPULSIVE ADMITTANCE AND
TRANSMISSION FUNCTION
Two useful series relationships between im-
pulsive admittances and transmission functions
will be derived in this section.
for small positive values of t. Then by (11)
and (19)
(26)
pi 1 ' pmH
If A0 0 the transmission cannot drop off
faster than 6 db per octave as the frequency
increases indefinitely. If the transmission is to
drop off ultimately at the rate of 6fc db per
octave all of the A's up to and including Ak.2
must be zero. This is to say that the impulsive
admittance and all of its derivatives of orders
up to and including the (k — 2)th must vanish
at * = 0.
Next, let us suppose that the impulsive ad-
mittance and all of its derivatives of orders up
to and including the (k — 2)th are continuous
through all values of t including t — 0 except
that the (k — 2)th derivative is discontinuous
only at t = a. We may resolve the impulsive
admittance into the sum W,(t) + W2(t) where
W1 (t) and all of its derivatives of orders up to
and including the . (fc — 2)th are continuous
through all values of t including t = 0, while
W2(t) =0 for all values of t < a. Then, for
small positive values oft — a
Ak.i (t - a)*"'
W,(t)
(k -
(Ak.t * 0)
whence
Hence the transmission cannot drop off ulti-
mately faster than 6(k — 1) db per octave. We
may summarize these results in the asymptotic
loss theorem.
Asymptotic Loss Theorem.
If the transmission is to drop off ultimately
at the rate of 6A; db per octave as the frequency
increases indefinitely, the impulsive admittance
and all of its derivatives of orders up to and
including the (k — 2)th must be continuous
through all values of t including t = 0.
Discontinuities in W(t) or in some deriva-
tive of W(t) cannot occur except at t = 0 in
the case of physical lumped element networks.
Practically, however, rapid changes in W(t)
CONFIDENTIAL
APPENDIX A
153
or in some derivative of W(t), at any value of
t, may be expected to be associated with much
the same behavior of the transmission at rea-
sonably high frequencies. As an example con-
sider the case
W{t) = e-- -e-v (0 > a > 0).
0 - a
F(p)
(p + +
W(t) is continuous through t — 0 as long as 0
is finite but becomes discontinuous there in the
limit as fi-* ». The first derivative of W(t)
is discontinuous through t = 0 even when 0 is
finite. The ultimate slope of the transmission is
12 db per octave, in accordance with the
asymptotic loss theorem, but in the range
a < w < p the transmission appears to have a
slope of only 6 db per octave.
The importance of the observations made in
the preceding paragraph, in the design of a
network, is that if we attempt to approximate
a W(t) which has a discontinuity in a deriva-
tive of lower order at t = a than at t = 0, the
fact that the physical approximation must have
continuous derivatives of all orders and through
all values of t except t - 0 is not very signifi-
cant. The ultimate slope of the transmission
may not be reached until the frequency is too
high to be of any importance.
Another useful relationship between impul-
sive admittance and transmission function fol-
PHYSICAL RESTRICTIONS ON THE
TRANSMISSION FUNCTION
The transmission function Y(p) of a lumped
element network is a rational algebraic func-
tion of p. It is real for real values of p (A.2) .
Hence, the coefficients must be real, and there-
fore the roots and poles must either be real or
occur in conjugate complex pairs.
Such a function may be expanded into the
sum of a polynomial and a rational function
whose numerator is of lower degree than the
denominator. The latter may therefore be prop-
erly expanded into partial fractions. For a
partial fraction of the form
— L_ *here)B=l,2 ...
(p — a)"
the contribution to the impulsive admittance
W(t) is by (18)
I; 1~- 1 = , » « > 0) .
L(p - a)"J (m - 1)!
For a pair of partial fractions of the form
A + iR A - iB
(p - a + iff)" + (p - a - iff)m
the contril
2r-i
to the impulsive admittance is
C (A cos fit + B sin pi) .
(m - 1)!
Since the impulsive admittance is the re-
sponse to an impulsive signal it is clear that for
/"» a stable network the impulsive admittance must
lows from the assumption that / t-W (t) dt be free of terms which increase indefinitely
with time, either on account of an amplitude
is finite for m =
exponential in
1, 2 ... If we expand the
F(p) = / \\'itu-*,tt
into a power series in pt we get
F(P) - M, - M , p + _
2!
3!
+
where
rW(t)di .
(27)
(28)
The quantity Mm is the mth moment of the im-
pulsive admittance.
When M„ = 1 we speak of the response of the
network as a weighted average of the impressed
signal, and speak of the impulsive admittance
W(t) as the weighting function.
factor of the form eat where a > 0, or; in the
event that a = 0, on account of an amplitude fac-
tor of the form fr"-1 where m > 1. Hence, the
physical restrictions on the transmission func-
tion are:
1. No poles with positive real parts.
2. Poles on the imaginary p axis must be
simple."
The poles of a passive transmission function
correspond to modes of free motion.lsh Each of
them may be shownlM to satisfy an equation of
the form
pT + F + - = o
P
where T, F, V are positive quantities whose
values depend upon the particular mode and
• Poles on the imaginary p axis must also be ruled
out on the ground that persistent transients cannot be
tolerated any more than growir
CONFIDENTIAL
154
APPENDIX A
its activity. However, T is zero in the absence
of kinetic energy, F is zero in the absence of
energy dissipation, and V is zero in the absence
of potential energy. It follows that in the
absence of coils or in the absence of condensers,
the transmission function must have poles only
on the negative real p axis.
For extremely narrow-band, low-pass appli-
cations, such as data smoothing, it is not prac-
ticable to build networks which call for coils
because these generally turn out to be of many
thousands of henries in inductance. The exclu-
sion of coils from these applications does not,
however, rule out transmission functions with
complex poles. These may be realized with RC
networks in feedback amplifier circuits as is
shown in Chapter 12.
*•» QUASI-DISTORTIONLESS
TRANSMISSION NETWORKS
A quasi-distortionless transmission network
is one which is distortionless only in a certain
sense. This sense will be made clear in this
section.
Let
Y(p)
1 + dip + o2p2 + ■ ■ • +ampm
1 + hp + 62p2 + . . . + bnjj*
(29)
This may also be written in the form
Y{p) - 1 + clP +
C-^+...+CI^+pr + lg(p)m
Obviously g (p) will be a rational function with
the same denominator as Y(p) and a numera-
tor of (*n-l)th degree. If we now apply a sig-
nal of the form
E{t) = 0
= r
for t < 0
for i > 0
the response, by (21), will be
V(t) « F + rcT* + ^7=2), cS-'+.-.+c,
+ rl L-1 [g(p)} «>0).
If the coefficients in the rational expression for
Y(p) are such that
ci = t/, c2 = //,•■• cr = fj
(31)
then
V(t) = (t + t,)> + r! L-i [g(p)} (t > 0). (32)
The second term vanishes exponentially with
time. The first term is an advanced or a re-
tarded facsimile of the applied signal accord-
ing to whether t, is positive or negative. We
shall say that Y(p) is the transmission func-
tion of a network which is quasi-distortionless
to the signal tr.
Obviously a transmission network which is
quasi-distortionless to the signal f must also be
quasi-distortionless to every signal f where s
is a positive integer less than r, including zero.
Hence we may state the quasi-distortionless
transmission theorem.
Quasi-Distortionless Transmission
Theorem
If the signal
E{t) = 0 for t < 0
= polynomial of degree r at most in / for
t > 0
is applied to a "quasi-distortionless transmis-
sion network of order r," the response will be
of the form
I'm = E{t + if) + {)(<■-<) for / > o,
where O(e ') stands for terms which vanish
exponentially with time.
If t, > 0 the transmission network is a pre-
dictor for polynomials of degree r at most.
However, it does not begin to predict properly
until some time has elapsed after the start of
the signal, or of a new analytic segment of the
signal; that is, until the transients have sub-
sided sufficiently.
If t{ — 0 the transmission network may be
regarded as a delay-corrected smoother for
polynomials of degree r at most. This is ob-
tained simply by taking
ai = bi, n2 = b2, ■■■ aT = bT
(33)
in (29),
A. 11
VARIABLE LINEAR NETWORKS
A variable linear transmission network is
one in which the response V(t) is related to the
impressed signal £(0 by the linear differential
equation (1) with coefficients which are pre-
scribed functions of t. The solutions of such a
differential equation also obey the superposi-
tion principle. Thus it is possible in this case
also to formulate the response of the network
to any signal in terms of its response to a
standard impulsive signal.
The response of a variable network to an
impulse or any form of signal depends, how-
CONFIDENTIAL
APPENDIX A
155
ever, on the time at which the signal is applied.
For an impulsive signal applied at time \ the
response at time t will be represented by
W(t,x). This is still called the "impulsive ad-
mittance." In the theory of linear differential
equations it is known as a Green's function.
Physically, it must be identically zero for
The superposition theorem may now be writ-
ten in the form
V(t) = jT+ E(\) ■ W(t,\) d\ (34)
provided the network has been properly de-
signed and set into operation at t — 0. If
W(t,\) dX = 1
for all values of t > 0, the response may be
interpreted as a weighted average of the sig-
nal. We note that in order to interpret the
response as a weighted average of the signal,
it is now no longer necessary to take the lower
limit in (34) as — oo, as it was in the case of
(2) for a fixed network. In other words, a
variable network can be designed and set into
operation at any time so that components of
the signal which arrive before that time are
completely ignored.
The analysis and design of variable linear
networks are in general much more difficult
than those of fixed linear networks. This is due
largely to the fact that there does not yet exist
a technique corresponding to the steady-state
and operational methods used in connection
with fixed networks. However, there is a class
of variable networks whose analysis and design
are greatly facilitated by the fact that they are
related to fixed networks by a transformation
of the time variable.
Consider the linear differential equation
. d"V dn~lV , . dV , Tr „
with constant coefficients. With appropriate
restrictions on the roots of the characteristic
function
6nXn + fc.-xX"-1 + ••• +bi\ + 1
it represents the response-to-signal relation-
ship in a fixed network, if z is proportional
directly to time. However, if z is a more gen-
eral function of the time, it will correspond to
a variable network. The kind of transformation
which is desired here is one which transforms
the range - oo < z < + tx into the range
0 < t < + oo with a one-to-one correspondence.
Thus, we may take z = log 6(t) where 6 (t) is a
positive monotonic increasing function of t in
the range 0 < t < + oo, with <li£0 6(t) = 0. Sev-
eral examples of 6(t), including 0(t) = t, are
considered in detail in Chapter 14.
CONFIDENTIAL
APPENDIX B
THEORETICAL MODIFICATIONS OF SMOOTHING FUNCTIONS TO FIT
NONUNIFORM NOISE SPECTRA
BEST smoothing or weighting functions have
been determined in Chapters 10 and 11
under the assumption of random noise with fiat
spectrum. It has not been worth while in prac-
tice to base the choice of best weighting func-
tions on any more elaborate considerations of
actual noise spectra, for at least three reasons :
1. The effectiveness of a smoothing network
shape of the weighting function.
2. Noise spectra are subject to variations,
due to factors which it is not desirable in prac-
tice to attempt to control.
3. Elaborate smoothing functions require
elaborate networks with close tolerances on ele-
ment values.
Nevertheless, the theory of smoothing pre-
sented in this monograph would not be com-
plete without showing how more general shapes
of noise spectra can be considered. Two meth-
ods are presented here, which are generaliza-
tions of those presented in Sections 10.3 and
10.4, respectively.
» 1 PHILLIPS AND WEISS THEORY7
Let g(t) be the tracking error, and W (t) the
impulsive admittance of a smoothing and pre-
diction circuit with smoothing time T. Then
the error in prediction due to tracking error
only, is
m = fQTQ{t - r) • W(t) dr.
The impulsive admittance W(r) will depend
also upon the time of flight which, for purposes
of analysis, is assumed to be constant. The
mean square error is then
V2 = -lim kjlLY^di
Jo So
W(Tl) • C(n - T|) • WWdtidtt
where
C(x)
lim
2L
g(\) ■ g(\ + x) d\ • (1)
C(x) is the autocorrelation of the error time-
function g (A) .
For an nth order smoothing and prediction
circuit V2 is now minimized with respect to the
impulsive admittance under the restrictions*
jf
T"W(r)dT = C-</)" (w = 0. 1. 2 ••• n). (2)
Hence W(r) must satisfy the integral equa*
tion
jj C(t - r) • W(r)dr = *0 + *i< + • ■ • + U"
(0 <. 1 <. T)
where the km are constants to be determined.
Now, if
i C(t - t) • W.m(r)dT = V" (0 <• t <. T)
Jo
(to = 0, 1, 2 - n) (3)
then
W(t) = hWoir) + hWi(r) + ••• + KWn(r). (4)
The procedure is then to determine C(x) from
(1), the Wm(r) from (3), the km from (2) and
(4), and finally W(T) from (4). It may be
noted that, in general, every km will be a poly-
nominal of nth degree in tf. Hence the Wm(r)
appearing here are not the same as those de-
fined in Chapter 11, although W(t) should be
the same if the same W0(t) is used in Chapter
11.
A difficulty of the theory given above is in
the solution of the integral equations (3) . This
difficulty is avoided in the theory given in the
next section. However, the integral equations
are easily solved in case of flat random noise,
when C(z) is simply an impulse of strength K
say, at x = 0. Then
W,
0 < t < T.
Since the strength is irrelevant, it may be taken
equal to T so that W0(T) will be normalized.
'These follow from the discussions in Sections A.8
«J A.10, especially equations (27), (28), (30), and
156
CONFIDENTIAL
APPENDIX B
157
For a linear prediction circuit it is then found
that
W(r) = 2 (2 + %)w0(r) - ! ( 1 + I ) Wr(r).
Putting T = 1 this may be expressed as
W(t) « Wo(t) + G,(- tf)voiM (t)
in terms of the G.(T) and Wmir) of Section
11.3.
« SYMMETRY OF BEST SMOOTHING
FUNCTIONS
The theory of Phillips and Weiss offers the
most direct proof that the best smoothing or
weighting function must be symmetrical, re-
gardless of the noise power spectrum. The
situation is that of minimizing (1) under only
one of the restrictions (2), viz., the normaliz-
ing condition
Jr W(r)dr - 1 (5)
The weighting function is therefore deter-
mined, up to a constant scale factor, by the
condition that
jf C it - t) • W(r)dr « k, (6)
where k is a constant. Substituting T — t for t
and T — t for t, we have
/C(t - 0 • W(T - r)dr « k. (7)
Since C( - x) = C(x), and since W(r) is de-
termined uniquely by (6) and (5), it follows
from (6) and (7) that
W(T - t) = W(t). (8)
»• GENERALIZATION OF ELEMENTARY
PULSE METHOD
The noise power transmitted through a net-
work may be expressed in the familiar form
p = / N(w») • |r(tW)|»d«
where N(u>*) is the noise power spectrum and
Yip) is the transmission function of the net-
work. Assuming that N(a>*) is a rational func-
tion of »*, which is finite at all finite values of
w including zero, it is possible to determine a
rational function S(p), which has no poles on
or to the right of the imaginary axis in the
p-plane with the exception of the point at infin-
ity, and such that
|S(tw)|2 = AT(fc>2).
It may be readily shown that
r-'£v<f>Y* (0)
where F(t) is related to the impulsive admit-
tance W(t) by the operational equation
F(t) = S(p) ■ Wit) (10)
The problem is now to minimize (9) under the
restriction
^ / Wit)di = 1 when <o > 1. (ll)
Let
where
Qip) - (P + «i) (p + 01) • • • (p + «-)
Hip) - (P + A) (p + A) ••• (p + A)
and ft is of no consequence. One or more of the
a's, but none of the pa may be zero. Since the
existence of the integral in (9) imposes the
requirement that Fit) have no discontinuities
of higher type than finite jumps in the range
0 - < t < 00, the continuity conditions on W(t)
in (10) must depend upon the difference be-
tween m and n in the expressions for Q (p) and
Rip).
If m > n, it is fairly obvious that Wit) must
be differentiate, in the ordinary sense, exactly
m — n times. In other words, Wit) and all its
derivatives up to and including the (m — n
— l)th must be continuous, but the (m - w)th
derivative may have finite jumps. If m < n we
must consider the introduction into Wit) of
discontinuities of higher type than finite jumps.
These discontinuities arise in the formal ex-
tension of the concept of differentiation to
functions containing finite jumps.
If a function 4 it) has a finite jump of am-
plitude A0 at t = a, the value of 4,' it) at that
point will be indicated formally as A0 • S0(t — a)
where S0 it — a) is a unit impulse at t = a. If
*'(a + 0) - *'(a - 0) = A„ the value of 4," it)
at t = a will be indicated formally as A0 .
it - a) + A, • 8„« - a) where $,(« - a) is a
CONFIDENTIAL
158
APPENDIX B
unit doublet at t = a. And so on, for higher de-
rivatives of $(<).
The expression (9) is a minimum under the
restriction (11) if Wit) satisfies the differ-
ential equation
Qip) -Q(-P) W(t) = const. (12)
when 0 < t < 1 and Y (p) the condition
1 /**"
2^ / S(P) -S(-P) • y (p)e*dp - const,
when 0 < t < 1. (13)
The restriction (11)' itself requires that
TP(t) =0 when t > 1, and
•i+
TT(<)<& = 1. (14)
r
Case I. (n = 0)
The general solution of (12) contains 2m + 1
constants of integration which are determined
by (14) and the 2m continuity conditions that
Wit) and all of its derivatives up to and in-
cluding the (m - l)th must vanish at t = 0 and
t = I.
Case II. (n # 0, m > n)
The general solution of (12) contains 2m + 1
constants of integration which are reduced
to 2n in number by (14) and the 2(m - n)
continuity conditions that Wit) and all of its
derivatives up to and including the (m — n —
l)th must vanish at t = 0 and at t = 1. The
remaining 2n constants are determined by (IS) .
The left-hand member of (13) may be for-
mulated by the method of residues. The ex-
pression for Yip) should first be separated
into two parts so that
Yip) - YL(P) + YK(p)e->
where YL (p) and YK(p) are rational functions
of S(p) S(-p) .YL(p)e» in the left-hand
in the left-hand half of the p-plane for the first
part of Y (p) , and in the right-hand half for the
second part. Hence, if the sum of the residues
of S(p) - S(— p) - YL(p)e» in the left-hand
half of the p-plane be donated by St. and if the
sum of the residues of Sip) • S(—p) • YM(p) ■
e»(t-i) in the right-hand half of the p-plane be
denoted by XK> then the condition (13) re-
duces to
2t - - const. (15)
Case III. (n ^ 0, m < n)
The 2m + 1 constants of integration in the
general solution of (12) are first increased to
2n + 1 by appending the 2 (n - m) singularities
kit), «i(0, 1(0
«o(< - 1), Slit - 1), ••■ — i H ~ 1)
and then reduced to 2n by (14) . The remainder
are determined by (13) or (15).
In formulating
Yip)
it may be noted that
£,[«„(< - a)] =
Example of Case I
W«)]
(a £ 0) .
Let S(p) = p". The differential equation (12)
requires Wit) to be a polynomial of degree 2m.
The conditions at t = 0 require it to have a
factor tm, and those at t = 1, a factor (1 — t)m.
This leaves only (14) to be satisfied. Hence
Wit) - (2t^,1)! [*(i - 01- (0 <; t Z 1)
in agreement with (8) of Section 10.8.
Example of Case II
Sip)
p + a
P + 0
Let
Then, by
W(t) - A0 + Aie-« + A,f (0 < < £ 1)
Hence
Y(p) . — 0 + — — — -l
(12)
p + a p — a
_ pL- + dip + A-q e-,
|_p p + a p-aj
2, =
Condition (15) is satisfied if
1
2
CONFIDENTIAL
APPENDIX B 159
where Example of Case III
Q « °" - 0i r . Let S(p) = 1/1 + fi. Then, by (12) and the
sinh ^ + 0 cosh rule for appending singularities in Case III
Hence W(t) = A0 + AMO + At60(t - 1) (0 £ 1).
Hence
l+Qcosha(/-i)
In the limit as o-»0, S(p) - - _ j^T + — ^ — e~
and 2* = - ^° ~ eK'-D .
W(t) « =-±-2 (0 <: < £ 1) . Condition (15) is satisfied if
1 + 1 &i A
f 62 + 0 A\ m At —
0
In terms of expressions (12), Section 11.3.
Hence
W(t) = Wt(t\ ± k™l(t) (0 il£l) , + + 6o(t - 1)
where k = 1/6 [£'/ (2 + £)]. This is reminis- w,q m f (0 £ f £ 1)
cent of Stibitz's results mentioned in Section 2
10.3. 1 + -J
p
CONFIDENTIAL
BIBLIOGRAPHY
PART II
1. The Extrapolation, Interpolation and Smoothing of
Stationary Time Series with Engineering Applica-
tion*, Norbert Wiener, OSRD 870, Report to the
Services 19, Research Project DIC-6037, The Mas-
sachusetts Institute of Technology, Feb. 1, 1942.
Div. 7-318.1-M2
la. Ibid., Chapter 1.
2. The AnalytiM and Design of Servomechanisms,
Herbert Harris, Jr., OSRD 454, Progress Report to
the Services 23, The Massachusetts Institute of
Technology. Div. 7-321.1-M7
8. Behavior and Detign of Servomeehanitmt, Gordon
S. Brown, OSRD 89, Progress Report 2, The Mas-
sachusetts Institute of Technology, November 1940.
Div. 7-821.1-M1
4. Antiaircraft Director T-15, OEMsr-358, Report to
the Services 62, Western Electric Company, Inc.,
August 1948. Div. 7-112.2-M6
5. The Analytit and Synthetic of Linear Servomecha-
nicmc, Albert C. Hall, OSRD 2097, Report to the
Services 64, The Massachusetts Institute of Tech-
nology, May 1948. Div. 7-821.1-MS
6. Antiaircraft Director, T-lS-El, E. L. Norton,
OEMsr-858, Report to the Services 98, Bell Tele-
phone Laboratories, Inc., July 80, 1945.
Div. 7-112.2-M11
7. Theoretical Calculation on Bett Smoothing of Poti-
tion Data for Gunnery Prediction, R. S. Phillips
and P. R. Weiss, OEMsr-262, AMP Note 11, Re-
port 532, The Massachusetts Institute of Tech-
nology, Radiation Laboratory, Feb. 16, 1944.
Div. 14-244.4-M'l
AMP-703.4-M11
8. A Long Range, High- Angle Electrical Antiaircraft
Director [Final Report on T-10], C. A. Lovell,
NDCrc-127, Research Project 2, Division 7 Report
to the Services 80, Bell Telephone Laboratories,
Inc., June 24, 1944. Div. 7-112.2-M9
9. Flight Records of Pitch, Roll, and Yaw, taken in
a variety of bombers at Wright Field, Ohio, Sperry
Gyroscope Company, 1942-5.
10. Detign and Performance of Data-Smoothing Net-
work, R. B. Blackman, OEMsr-262, Report MM-44-
110-38, [Bell Telephone Laboratories, Inc.], July 8,
1944.
11. Computer for Controlling Bombers from the
Ground, E. Lakatos and H. G. Och, OEMsr-262,
July 24, 1944.
12. A Position and Rate Smoothing Circuit for Ground-
Controlled Bombing Computers, R. B. Blackman,
OEMsr-262, Report MM-44-110-79, [Bell Telephone
Laboratories, Inc.], Aug. 21, 1944.
13. A Two-Servo Circuit for Smoothing Present Posi-
tion Coordinates and Rate in Antiaircraft Gun
Directors, R. B. Blackman, Contract W-30-069-
ORD-1448, Report MM-44-110-65, [Bell Telephone
Laboratories, Inc.], Sept. 27, 1944.
14. The Theory of Electrical Artificial Lines and Fil-
ters, A. C. Bartlett, John Wiley and Sons, Inc.,
1931, p. 28.
15. Network Analysis and Feedback Amplifier Design,
H. W. Bode, D. Van Nostrand Company, 1945.
15a. Ibid., Chapters 7, 8, 18, and 14
15b. Ibid., p. 813.
15c. Ibid., p. 326.
15d. Ibid., p. 801.
15e. Ibid., p. 38.
15f. Ibid., p. 12.
15g. Ibid., p. 78.
15h. Ibid., p. 110.
15i. Ibid., p. 133.
15 j. Ibid., Chapter 6.
16. Fundamental Theory of Servo-mechanisms, L. A.
MacColl, D. Van Nostrand Company, 1945.
17. Automatic Control Engineering, E. S. Smith, Mc-
Graw-Hill Book Company, Inc., 1944.
18. Die Lehre von den Kettenbrucken, B. G. Teubner,
Leipzig, 1918.
19. "Transient Oscillations in Wave Filters," J. R.
Carson and O. J. Zobel, Bell System Technical
Journal, July 1923.
20. "Harmonic Analysis of Irregular Motion," Nor-
bert Wiener, Journal of Mathematics and Physics,
Vol. 5, 1926, pp. 99-189.
21. "Generalized Harmonic Analysis," Norbert Wie-
ner, Acta Mathematica, Stockholm, Vol. 55, 1930,
pp. 117-258.
22. "Stochastic Problems in Physics and Astronomy,"
S. Chandrasekhar, Review of Modern Physics, Vol.
15, 1943, pp. 1-89.
28. "Mathematical Analysis of Random Noise," S. O.
Rice, Bell System Technical Journal, Vol. 23, 1944,
pp. 282-832.
23a. Ibid., Vol. 24, 1945, pp. 46-156.
«S 1S07S
CONFIDENTIAL
[>1
Cover Sheet for technical memoranda
Research Department
subject: The Transient Behavior of a Large Number of Four-
v-' Terminal Unilateral Linear Networks Connected in
Tandem - Case 20876
ROUTING:
1 - H.W.BW.B*F.-H.F#-Case Files mm- 46-110-49
2 — case files °ATE April 10, 1946
3- L.G.Abraham-T.E. Brewer authors C.L* Dolph
4- C.H.Elmendorf-H.K.Krist idotbqkxoex C.E. Shannon
s - H.S.Black-F.B. Anderson Index No. W1.416
e- G»N*Thayer-C.W.Harrison
7 - R.L.Dietzold
a - L.A*MaoColl ' 1
9 - B.M.01iver
10- C.L^Dolph
11- C.E.Shannon
ABSTRACT
Asymptotic expressions for the transient
response of a long chain of four-terminal unilateral
linear networks connected in tandem subject to an
initial disturbance are developed and classified accord-
ing to the characteristics of the common transfer ratio.
It is shown that a necessary and sufficient condition
for the stability of the chain for all n is that the
transfer ratio be of the high pass type.
The mathematical results are applied to
chains of self-regulating telephone repeaters.
The Transient Behavior of a Large Number of Four-Terminal
Unilateral Linear Networks Connected in Tandem - Case £0878
MM-4 6- 110-49
April 10, 1946
MEMORANDUM FOR FILE
Introduction
The transient response behavior of a long chain of
invariable four-terminal networks connected .unilaterally in
tandem is of primary importance in the design of cross-country
wire communication systems, since the successful operation of
such equipment depends upon the rapid damping of transients
caused by suddenly applied inputs.
While the emchasis in the memorandum will be directed
toward coaxial systems cons'is-fcing of self-regulating ^repeaters
spaced at 3-7 mile intervals and spanning distant points, the
results are of a more general nature and would apply, with
obvious modifications and corresponding interpretations, to any
configuration involving a large number of four-terminal linear
invariable networks connected unilaterally in tandem.
It will be shown that there are two fundamentally
different types of transient, response possible depending upon
the gain characteristic of the transfer ratio of the individual
four-terminal linear networks comprising the system. The first
type of response while satisfactory is difficult to achieve in
practice because of the stringent requirements on the gain
characteristic of the transfer ratio. The second, a case often
encountered in practice, will be shown to be unsatisfactory in
general since it leads to build-up and overloading in any
physical system comprising a large number of such networks.
However, a guiding design orinciple will be suggested which,
it is believed, will enable us to minimize the worst of the
effects, and make the successful operation of a system of the
type envisaged here possible.
This memorandum is divided into two parts. In the
first the problem is defined physically and then formulated
mathematically. Following this, the history of the problem is
discussed briefly after which the new results are summarized.-
Finally, this part concludes with a discussion of their inter-
pretation and implications for the coaxial system. The second
part presents the detailed mathematical arguments which led to
the new results of part one.
PART I
Statement of the Problem
The analysis in this memorandum is directed toward
the understanding of certain anomalous effects which a long
chain of self-regulating telephone repeaters may exhibit at its
output when the input end of the chain is subject to a transient
disturbance (Cf. Figure 1).
The gain settings of the repeaters in such a chain
are usually controlled by the level of a pilot frequency some-
where in the communication band and the regulation is designed
to compensate for low frequency phenomena (up to approximately
one cycle per second) such as the diurnal Change in line resis-
tance. The repeaters in the chain are normally absolutely
stable devices so that any transient which is presented to the
input of any one of them will be evanescent in time at the
output of that repeater.
Since transients are not damped out instantaneously
even in absolutely stable devices, a transient disturbance at
the input to the first repeater in such a chain will be pro-
pagated down the chain. It has been experimentally observed
that under certain conditions the' maximum amplitude of a tran-
sient disturbance may increase as the disturbance is propagated
from one repeater to the next and in some cases there may be
many oscillations of sufficiently large amplitude to render the
system inoperative because of prolonged over-loading.
If the entire chain from its input to its output end
is considered as a whole, the chain does behave then in many
respects like an unstable non-linear device in spite of the
fact that each repeater in the chain is absolutely stable.
Since it is obvious that the above type of behavior
is at best undesirable in a cross-country link, it is necessary
that its cause be thoroughly understood and that all .possible
steps be taken either to suppress it or, if this is not possible,
at least to minimize its effects.
Although it is not reasonable to expect that transient
oscillations can be kept from propagating down the line, or that
it is possible to isolate the line from all transient disturbances
it is reasonable to seek a means of guaranteeing that the tran-
sients that are propagated down the line will never possess
amplitudes that exceed the magnitude of the original disturbance
or to seek a way to guarantee that the maximum response of the
transient oscillations will occur so shortly after the initial
disturbance that physical apparatus will be incapable of follow-
ing or distinguishing it from the unavoidable initial disturbance.
A way of guaranteeing the first of these will be discussed at
length and a suggestion will be made which it is felt will
guarantee the second, although no rigorous proof of this last
fact has yet been given.
Fig. 2 represents a schematic drawing of a typical
satisfactory type of transient response which might result from
a unit step input to the first unit of Fig. 1. Fig. 3, on the
other hand, represents a schematic drawing of a typical unsatis-
factory type of transient response which could result from the
same input to a system of the type of Fig. 1 which had different
characteristics. Briefly then, the problem to be discussed is
that of determining the relationships between the network
characteristics and the transient response for networks of the
form of Fig. 1.
Mathematical Formulation of the Problem
A sudden change in level in the pilot freauency
before the n-th repeater results in the modulation of this
frequency, changing it from its normal form
A sin <i> t
C
to
A sin u> t [1 + f(t) ]
c
where f(t) represents the modulation introduced by the tran-
sient.
After passage through the n-th repeater, this last
expression is transformed into
A sin (u>0t + <p) [1 + g(t)],
- 4 -
where the repeater and regulator have (possibly) changed the
carrier by the addition of the phase angle q> and have modified
the original envelope A[l + f(t)] into A[l + g(t)].
It is clear that from the standpoint of regulation
it is sufficient to limit discussion to the transformation
of f (t) into g(t) .*
The exact relationship between f(t) and git), of course,
depends upon the characteristics of the repeater-regulator cir-
cuits which are in general non-linear. However, for small signal
inputs their behavior may be satisfactorily represented by that
obtained from a linear invariable four- terminal network. Thus,
the chain of self-regulating repeaters may be replaced, for the
purpose of mathematical analysis, by a chain of linear invariable
four-terminal networks having a common transfer ratio y(p). Thus,
the blocks of Fig. 1, will be idealized as being such linear four
terminal networks throughout the analysis.
Because regulation is designed to compensate for low
frequency phenomena, certain characteristics that y(p) should
possess are known a priori: namely;
" (1) y(p) must represent a high-pass system. That is, .
y(p) — > 1 as p — > oo
(2) y(0) should be zero if, in the terminology of servo
theory, there is to be no static error.
■
In terms of y(p), the design of a self-regulating
system reduces to two problems:
(I) Given y(p), to calculate the transient behavior of
the chain of self-regulating repeaters,
(II) The design of a system having a y(p) which leads
to satisfactory transient behavior.
The rest of the memorandum will be concerned largely
with the first of these. The calculations will be carried out
in general terms and the different types of possible responses
will be described in terms of the characteristics of y(p),
* Transit time between repeaters is neglected throughout this
memorandum. More exactly, we choose a different origin of time
at each repeater, so that the transit time does not appear ex-
plicitly in the formulae.
- 5 -
Mathematically the problem discussed in this memoran-
dum can be formulated as follows: If 'y(p) represents the common
steady-state transfer ratio of the four-terminal linear units
shown connected in tandem in Figure 1, the output voltage response
of the n-th unit V(t) is given by the inverse Laplace integral:
vn(t) = ^
-C + 1CD
c-ioo
y(p)n epH0(p) dp
where V (p) represents the spectrum of the input voltage,
o
For an impulsive input of intensity YQ applied at
time t = 0,
= V
For a step function input of height VQ applied at
time t = 0,
VQ(p) = VQ/p.
-
Specifically, this memorandum will be devoted to the
study of the behavior of Vn(t) for large values of n.
Four-terminal networKS are normally classed as low-,
band-, or high-pass depending upon the character ofly(iw)|.
Typical examples of I y( ico) I are shown in Figure 4a, in which,
following the usual practice, ly(iu)l has been normalized to be
unity at a) = 0 in the low-pass case; at o> = wo> (the mid-band
frequency), in the band-pass case; and at to = oo in the high-pass
case.
From the viewpoint of the asymptotic behavior of the
system in Figure 1, it is convenient to modify this classifica-
tion somewhat when speaking of the over-all gain characteristic,
|y(iu))|n, of the transfer ratio of a system comprised of n units.
For sufficiently large n, it is clear that |y(iu)|n would lead
to curves of the type shown in Figure 4b corresponding to the
low-pass, band-pass and high-pass curves of Figure 4a. Thus,
for sufficiently large n, the gain curves B*, C«, and D* of
- 6 -
Figure 4b are seen to exhibit the type of behavior normally
associated with a band-pass characteristic. A'* and E*y °n the
other hand, exhibit behavior of the type normally classified as
low-pass and high-pass. For these reasons, the terms low-, and
high-pass will henceforth be reserved for those gain character- ,
istics which are always less than their values at u = 0 and
a) = oo , respectively. The termj band-pass, will be used to
cover all other cases; namely, those in which ly(ia>)| possesses
one or more maxima at finite frequencies, the values of which
exceed the values of ly(iu))| at both zero and infinity.
History of 'the Problem
Several people have considered this problem in the
above mathematical form. Before proceeding to a discussion of
the results of the general theory, it will be instructive to
consider a few illustrative examples of their results.
Let
(2) =
y(p) = p/(p+D
The gain characteristic is clearly of the high-pass
type and satisfies (1) and (2) of Page 6. If the input voltage
is a unit step, then, by the theorem of residues,
,n-l
d(t)
n-1
i ' — 'p=-i
where L- ,(t) denotes the Laguerre polynomial of degree (n-2).
A plot of Vn(t) for n = 1, 2, . . . , 10 is shown in Figure 5. It
is known that for large n
Lit) = J= ? (nt)-1/4 cos
11 V TT
2(nt)1/2 - g
*This examde was first treated by L. A. HacColl (MM-39-325<-166) ,
9/11/39 and W. H. Wise ( UK- 38-343-22 ) , 8/2/38. The above
treatment follows that of LlacColl.
where = is to be interpreted as "asymptotically equal to."
Thus
t
A plot of the approximate "envelope"
t
1 e 2 (nt)'1/4
is given for n = 50, 100, 150, 200, and 250 in Figure 6.
The response in this case is seen to be both ampli-
tude and frequency modulated, the "instantaneous frequency" in
the sense of frequency modulation theory being given by
u ' m ^ (2(nt)1/2) « A
while the envelope of the amplitude modulation is approximately
exponential. In particular, the type of behavior found here
can be considered satisfactory since there is no tendency for
the magnitude of the largest overshoot to increase without limit
as the number of repeaters is increased. As will be shown
later, this type of behavior is typical of any network having
a high-pass characteristic in the generalized sense of that term
as it has been defined above.
In MM-40-3500-92 dated 10/14/1940, J. G. Kreer and
J. H. Bollman concluded that the appropriate y(p) for a self-
regulating repeater employing a directly heated thermistor
element in the control device was given by
It should be observed that for o 4= 0 this transfer
ratio does possess static error. L. A. MacColl in MM-40-130-270
treated this case for Id < 1 and found that the system exhibited
essentially the same type of satisfactory behavior as that
discussed above.
- 8 -
(2) A slightly more complicated example is given by
y(p) = P<P + °]
(p + D2 * '
It is easily seen that for a < vTT, I y( iu>) I is a high-pass
jharacteristic in that I y( ico) | < 1 for all finite to and
y( io>) I — > 1 as co — > oo . On the other hand, if ft > -/IT,
y(io))| possesses a maximum greater than 1 at some finite
frequency. ly(ito)[ is illustrated by curve I in Figure 7 for
a = 1.4 (high-pass) and by Figure 8 for c = 2 (band-pass).
The response Vn(t) to a unit step function is shown in Figures
9 and 10 for these two cases with n = 1,2 9. The character
of the response is seen to be of a radically different kind
for these two values of a.
For a = 1.4 the response is seen to be of the same
type as that encountered in the first example. For a = 2, on
the other hand, it seems to represent an oscillation in which
the magnitude of the largest overshoot is increasing without
limit as n tends to infinity. Later it will be shown that
this is in fact the case and that satisfactory operation is
impossible for a large number of repeaters in this case.
From this and other considerations L. A. MacColl
conjectured that a necessary and sufficient condition that
the response V (t) be bounded for all n was that the transfer
ration y(p) have no net gain at any frequency. Mathematically
expressed, a necessary and sufficient condition that
I Vn(t) I < M for all n,
where M is independent of n and t, is that
(M) I y( ito) I < 1 for all real frequencies to.
Physically, the condition on y(ito) prevents the transfer ratio
]y(ito)|n for a system using n units from having a tremendous
gain at any particular frequency.
This case was also treated by L. A. MacColl, but no memorandum
on it was ever written.
In one sense this memorandum could be summarized as
a proof of this conjecture. In particular, a direct proof of
the necessity of MacColl's condition (M) is given in the second
part. The remainder of that part is devoted to an indirect
proof of the sufficiency. The argument consists in exhibiting
the two types of possible responses; the first being that
associated with a y(p) satisfying MacColl's condition and that
second that resulting from a y(p) which violates it at one or
more frequencies.
Statement of Results
The detailed results of the sufficiency argument
are discussed conveniently in terms of the generalized
characterization of high-, band-, and low pass y(p)'s as
given on page 8, The results will be taken up in that order.
High Pass
In terms of the above classification, the class of
high pass y(p) 's consists of just those functions which satisfy
MacColl's condition and are therefore those from which a satis-
factory response could be expected. For the y(p)fs in this
class, it is clear on physical grounds that the maximum contri-
bution to the response V (t) of equation (1) will come from the
large values of |w| since for these values of I u| , |y( io))|n > 1
while for all other values of I co| , I y( iu>) I n — > 0. Using the
first three terms of the Laurent expansion of y| iu>| about u = oo ,
one finds:
(5)* y(iu)) = 1 + S_i + \ ,
(6) ly(iu)l ~
, a2 + 2b
1 + — s —
0)
1/2
to
(7) Angle y (iuj Sf.g .
* It is assumed that a > 0, b < 0, and that 2b + a <,0. These
assumptions correspond to a second order maxima at I u)l == oo and
to a monotonic decreasing phase function for y(p) as I oo] — > oo .
- 10 -
If these approximations, which are valid for I to| sufficiently
large, are introduced into equation (1), it can be shown that
the principal contribution to V (t).for a unit step input is
given by:
Vn(t) * (n)-1^ (nat)-lA exp | jfi!j±-^>tj cos (EvHSt
This, with a suitable interpretation of the constants
a and b is seen to be of the same general form as the response
obtained by liacColl for y(p) = p/(p + 1) as given by equation (
Just as in that example the response is both frequency and ampli
tude modulated. The instantaneous frequency of oscillation is
again given by
•
The gain for
y(p) = P(P i
(P I D2
is shown on curve I of Figure 11. Curve II of this figure
represents ly(iw)|100 for this y (p'). For this example and
n = 100, the true gain |y(iu)|100 ana the gain approximation
resulting from equation (6) are indistinguishable on the scale
of Figure 11.
The corresponding phase characteristic for y(p)100
is plotted on Figure 12 where, for reasons which will appear
in Part II, the actual frequency has been replaced by
w» = ^_ .
-✓n
Again, on the scale of Figure 12 the actual phase is indis-
tinguishable from the approximation resulting from equation (7).
Figs. 7 and 13 present the same information for
y(p) =2l£_^il
(p + ir
and n = 100.
- 11 -
Again the agreement between the actual phase and the approxi-
mation is excellent. However, there is a considerable error
in the gain approximation for small I <d| ► This large error is
unquestionably due to the fact that the value o = 1.4 is near
the critical value a = ST at which the characteristic changes
from high-pass to band-pass.
Agreement with the above asymptotic formula can of
course be obtained by increasing n sufficiently. Alternately,
for n = 100, a better approximation to the gain can be obtained
by writing
y( iu) = 1 +
a i
.0)
b
~2 +
CO
and
ly(iu)l =
l +
2b + a
2d + b + 2ac
CO'
' I/2
This approximation leads to a curve which is indistinguishable
from that of FyU^)!100 in Figure 7. With this approximation,
one finds the following expression for VQ(t) when the input
is a unit step function
*
V (t) * (nj^Cnat)-1/4 cos (2^nat JL ) exp((a^2bU)
( (2d + b2 + 2ac)t2 )
i1 + 2^ ■!
( )
This expression is seen to approach that given by equation (8)
as n > co . Thus one can conclude that the response will
always be satisfactory if' y(p) belongs to the class of high-pass
characteristics .
Band-Pass Case
MacColl»s condition is clearly violated whenever ly(iu))|
has one or more relative maxima greater than 1 at finite fre-
quencies. For simplicity the case where |y(iw)l has only one suet
12 -
maxima at u = to0 will be treated first. It will furthermore be
assumed that this maximum is of the second order; i.e.
d2
dw2
^ 0.
Under these conditions, it is physically clear that the maximum
contribution to the response V (t) as given by equation (1) will
be due to those frequencies near o>o, at which I y( iu>) I possesses
its maximum, since as n increases ihis region becomes increasing
more important than all the rest. It is also clear that the time
of maximum response will be given by the delay time experienced
by the frequency wQ in passing thru the network. This is known
to be given by. tQ = - n B'(w0) where Bf(u0) denotes the slope of
the phase characteristic B(u>) in the expression
(10)
y( iw) = A(uj) exp ( iB(u) ) .
If A(to) and B(u>) are expanded in a Taylor's series about u> = coq
and terms up to the second order retained, it can be shown that
the response to a unit impulse function is given by
(ii) vn(t) = A(^Jn
VZn
G(u0) exp (
-(t-to)cH(0)n)
o/ ) cos |u>Qt + nB(uQ)
where
0(»0) - n-V8j
(
( —
A"("0)
-1/4
* CB»»(w0)n
H(«0)
A' '(cup)
(I A"l«Q)
2>
> 0
- 13 -
(B"(w0) A{« J)
io((,o) = arctanj 2a,,([Uq) )
)
tQ = -nB(wQ) .
Thus V (t) can be interpreted as an amplitude modulated
n
wave with an envelope proportional to the Gauss error curve
(-(t-tj2 )
e*Pj 2n H^o)j
with a standard deviation given by
(
(
( n
(
(
(A
)2
- )l/2
(B"(U)Q))2
J )
The standard deviation cr is of course a convenient measure of the
duration of the disturbance. The maximum response occurs for time
t = - n B' (« ) at which time the amplitude is proportional to
A("0)n
. ✓IE
Thus if A(w ) >1, the maximum response will represent a value
which is very large compared with unity, the magnitude of the
original disturbance, if n is large. This would force any system
involving vacuum tubes to overload if n were sufficiently large.
These properties are summarized in Figures (14) and
(15). Figure (14) is a plot of the response for values of t
near t for a few values of n for the example given by equation
(4) where a = 2. Figure (15) is a plot of the maximum response
for a few values of n for different values of the parameter a.
It should be remarked that the above approximation to
the gain which was obtained by keeping only the first two terms
- 14 -
of the expansion of A(w) about go = u)Q could only be expected to
be a reasonable one for fairly large values of n, since it
represents a usually unsymmetric gain characteristic by a
symmetric function. A better or second approximation can be
obtained by using three terms of the Taylor's expansion instead
of two. Just as in the high pass case, the retention of this
extra term gives rise to a second term in the expression for
Vn(t) but it does not fundamentally alter the characteristics
of the response since the correction term vanishes for t = t ,
at which time the response is still a maximum, with the same
amplitude as before. Its only effect is to take cognizance of
the unsymmetrical character of the gain characteristic A(w) and
to change the resulting response envelope to an unsymmetrical
one. Of course, it also modifies the phase of the oscillation
inside the envelope in a complicated way without changing the
fundamental frequency of oscillation. •
•
For these reasons and because of the complexity of the
resulting expression, it will not be written down here explicitly
although the explicit approximation to the gain A(w) will be
discussed in Part II.
The two approximations to the gain are illustrated for
equation (4) with a = 2 in Figure 16 for n = 100, In this case
. . |u)|-/)2 + 4
A(u) = 5 •
(iT + 1
As can be seen from the figure, the second approximation does in
fact represent A(w) over the significant range of frequencies
near -w from which it can be concluded that the response will be
unsatisfactory. Figure (14) r previously referred to, furnishes
a picture of the envelope response as obtained from the first
approximation.
In the event that A(^) takes on its maximum value at
more than one place in the finite frequency range, it is clear
that the above results can be generalized as follows:
Let V . (t) be the response of the form given by equation
(11) due to a maximum at co = w- , Let the time of maximuma response
- 15
from this maximum be denoted by t. = -nB*(wj_)» Then the total
response is clearly given by the expression
k
vn(t) = Z V .(t).,
n i=1 ni
if there are k relative maxima* Unless the values of A(w) at
the points u) = are nearly the same, it is also clear that
only those terms of the above sum which correspond to the largest
maxima of A(w) will be of significance. .
The band-pass case is also discussed briefly for unit
step inputs in Part II.
Low Pass Case
Since the low-pass case differs from the band pass case
only in that A(w) has its maximum for w = 0 instead of at u = uQ
^ 0 the results of the two are very similar. The results in
the low-pass case are simpler because it will be recalled that
B(w) (as defined by equation 10) is an odd function of 10 for any
physical network, This forces both B(0) and B'^(0) to be zero so
that for an impulsive input one obtains the simple formula;
(12) j It) Vim In"3/2
n -/2n (
A"(0)
-1/2) (t-tQ)2 A(0))
J exp [ 2n A'* (0)j
This result corresponds to the well-known formula from
transmission line theory for non-distortionless lines.
Remarks
From the practical viewpoint the above results have the
following implications for communications systems such as a
cross-country coaxial telephone system employing self-regulation
repeaters spaced at intervals of a few miles.
(1) If the transfer characteristic of each individual
network is of the high-pass type (in the sense in which this term
has been used above) then the transient response will never exceed
the initial value of the disturbing input voltage and it will
be damped out so that the operation of the communication system
would generally be considered satisfactory.
- 16
(2) If the network is not of the high-pass type, the
usual practical case, and there is any net gain in the system,
which is peaked at u>0 then for even a small number of units the
response will exceed the initial input at the time given by
tQ = - nB'(u>0)
where
A'(u)0) = 0
and if the number of units is sufficiently large the output
from the n-th unit will be large enough to cause severe over-
loading.
At first glance these implications are not promising
and seem to indicate that the operation of a cross-country
system involving several hundred repeaters and regulators would
be extremely difficult, since , the only satisfactory characteristic
is difficult to attain in practice. However, "practically the
ideal characteristic which is high pass can be approached in the
sense that the peaked frequency can be made very large. Thus
the maximum response may occur so soon after the initial distur-
bance that the physical system would not be able to follow it or
to distinguish it from the initial disturbance which in many
cases would be large enough to cause momentary overloading of the
system.
Moreover, it is ah experimental fact that in the design
of feedback regulator characteristic forcing the peaked frequency
higher reduces the size of the- peak which in turn will permit the
use of a larger number of regulators in the system.
If this is done, the time of maximum response, tQ =
nB'(^0), will be small since B'(a)) in general is small for large
u). Assuming that the effects of the maximum response have been
treated in this way, it is natural to inquire into the type of
response which will result for finite values of t > tQ.
If one examines the gain characteristic curve of the
type shown in Figure (7), it is clear that for frequencies less
than some frequency u>, slightly less than the peak frequency u>0,
- 17 -
the shape is fundamentally like that of the high-pass case.
Remembering that the phase delay of a frequency through a linear
network is given by the slope of phase characteristic at that
frequency, it is clear that the response for values of t greater
than tQ, the time of maximum response, will come from the fre-
quencies less than uQ, since the phase slope characteristic is
large for small frequencies and small for large frequencies.
Now if it is assumed that the phase characteristic nB(u>) is a
monotonic decreasing function of to, it is clear that the 'function
(nB(w) + tot) will always be stationary at an arbitrary frequency
u>, provided that t is given a suitable corresponding value. Thus,
it is reasonable to expect that the response for t » tQ* will
exhibit the same type of character as that obtained in the high-
pass case discussed above. This, it will be recalled, is both
frequency and amplitude modulated with an envelope which decreases
approximately exponentially. Thus, under these circumstances it
seems reasonable to supoose thet satisfactory operation of the
communication link could be obtained.
To recapitulate, the most practical design for any
system of the type envisaged in Figure 1, from the viewpoint of
satisfactory transient response involves approaching the high-
pass characteristic as closely as possible by making the gain
characteristic of the transfer ratio peak at as high a frequency
as is practicable and by keeping the phase slope characteristic
monotonic for all smaller frequencies.
PART II
Mathematical Discussion
Theorem I. A necessary condition that the response Vn(t) from a
chain of n-four terminal linear invariable networks sub.ject to~a"
unit step input function have a common finite bound for all n is
that the transfer ratio y(p) satisfy the relation-
(M) |y(iu))|< 1 for all real values of w.
* A different type of expansion, valid for any fixed t or n — > co
is discussed at the end of Part II.
By
- 18 -
Proof: By hypothesis
Iv (t)|< M for all n where M is independent of n and t
n ■
,00
so that
Vn(p) = J e"pt Vn(t) dt
n VP)
y(p)n . , pVn(p)
ly(p)ln - ipl|f° e~pt vn(t) dt|
lvn(t)l dt
< I pi M J I
If p = c + iw and if c > 0, then
' 2 'c
C + Od
M
so that
log (y^kllog^V/
Thus, in the limit as n — od , it follows that for any
p with a positive real part
log I y(p) !< 0
- 19 -
and hence
ty(p}]< i
Since this relation holds everywhere in the right-hand half
plane, it follows from simple continuity considerations that
the maximum of ly(iw)|, never exceeds 1, Thus
ly(iw)l < l
as was to be shown.
The remaining discussion will be devoted to the
characterization of the different types of possible responses
and will, at the same time, furnish an indirect proof of the
fact that the condition (M) on y(p) is also sufficient.
High Pass Case - Unit Step Input
If the networks comprising the system shown in
Figure 1 possess a transfer ratio having a high pass ^ gain char-
acteristic in the sense defined above, and if one writes ,
y(iu>) = A(u) eiB(u))
then the gain function A(«) satisfies the two conditions
(A) A(w) < 1 for all finite frequencies u».
(B) Lim A(w) = 1
to •-* 00
Under these conditions it is clear that, for sufficiently large n,
the main contributions to Vn(t) will be due to the high values of
I u)| . For convenience, . Vn(t) is written here in slightly dif-
ferent form
Vn(t, -He \l fA(.,»eW«'-' -^
("J0 )
- 20 -
For large values of I w| , all physical transfer ratios y(ito)
of interest to us here can be represented by an expansion
of the form*
M„v , . , . ( , ai b ci d )
•
We. shall confine our attention to the ordinary case, in which
a > 0, b < 0 and 2b + a2 < 0. For large values of f col , we now
have
1/2
(14) A(u) = S[l + \ + 4 + ...T2 + C§ + -% + ---l2!
V GO U) to '
a c
(15) B(u)) = arctan u)
— + —75- + • • •
, b d
1 + ~2 + ~4 +
It is clear that, for I oo| sufficiently large, the
leading terms of these expressions will furnish adequate approxi-
mations to A(u) and B(w). These are:
2 9. 1/2
(16) A(w) = [1 + a +z 2b]
(IV) B(u)) = § .
Let uQ be the frequency defined by the condition that
these approximation are accurate to within the arbitrarily chosen
permissible error e for values of go such that w>wq. Then we
can write
* In the usual case y(p) is a rational function, so that this
expansion can be readily obtained.
- 21 -
( „co . r _ , , . n n doj
Vn(t) = ± Re Jo° A(co)n eirnB(u))+ut^] -
O) CO
o
=-±Re (Ix + I2).
It is clear that
II I < fo iam£ dw-
1 ~J0 I col ■
Since fA(w) Jn — 0 for each co in the finite range 0 < to < u ,
it is clear that 1 I -J can be made negligibly small by taking
n sufficiently large. Introducing the new variable v defined
by the relation
v = CO
J
na
■
I2 can be written as
r00
1 +
(a + 2b )t
nav
V
Letting
(a + 2b)t
av2
- 22 -
and using the binominal expansion, one has
Ca* + 2b) t
2
nav
n/2 —
1 +
n/2
1 + f +
| (§ - 1)
(41)'
1 + J + 1/2 (1 - ^) (X) +
e^2 + terms in l/n.
Thus, for sufficiently large n, I2 becomes, approximately
e
2
(a + 2b)t
2av
e
Vnat (- + v) dv
In this form the principle of stationary phase can be applied to
I2 (Cf. Appendix I); for the amplitude factor
(a2 + 2b)t
2av2
e
v
is independent of n,. while the phase function (in the notation
of the appendix)
¥(v) « + v)
is monotonic in the range of integration on each side of the
stationary point (v = 1) where
tp'(v) = 0
- 23 -
Physically speaking the form of equation (18) suggest
the interpretation of Vn(t) as the sum of an infinite number of
complex waves whose amplitudes are slowly varying function of v
and whose complex phases are rapidly varying functions of v.
Under this interpretation it is physically reasonable to expeot
that wave interference will occur everywhere except near v = 1
where the phase function given by equation (19) is stationary.
This is the principal of stationary phase. It remains to
evaluate the principal contribution to Ig for values of v near 1.
Replacing y (v) by the first three terms of its Taylor*s series
about v = 1,
q>(v) = cp(l) + 0 + - 1) = 2 ♦ (v -l)2
the main contribution to Ig is given by
r>l+Tl
1 * eir2vnat - |]
1-n
e 2av2 iVnat (v - l)2 dv,
e
In the interval (1 - r\f 1 + r\) t the amplitude factor
i exp T(a2 + 2b)t/2av2]
is substantially constant and may be removed from under the
integral sign and evaluated at v = 1. By the reasoning of
Appendix I, the contributions to the remaining integral are
not appreciably affected if the limits are changed to (-co, oo )
respectively. Letting
I * v - 1
we can then write 10 in the form
I ~ exp j(a2 exp fi 2v€St - 1 §3 f°° eiVMt « d£
( ) -CD
- 24 -
By the known properties of Fresnel integrals
—00
and hence
Taking the real part and dividing by n, the asymptotic expression
for Vn(t) is therefore given by:
(20) Vn(t) = n'V2 (nat)-1^ exp ( ( ag +2b)t) cos {Z/m _ n,
which is equation (8) of Part I.
A more accurate approximation to the gain A(w)n is
given by
if,.i n 2b ♦ a2 2d + bf_j_2ac-.l/2
A(w) = [1 + * + t J
where the first three terms of equation (13) have been retained.
From this it follows that:
m.a* ~ n (/2b + a2 2d + b2 + 2ac?
A(w) = exp -J- ( § + t J
exp [n (2b . a2)] exp j| (2d+b2+2ac)|
(* ^ ) (2 ^ )
from which it follows that the second approximation is obtained by
multiplying the first by the factor
exp (p
r
jn (2d + b2 + 2ac)
If the frequency transformation v =
7?
is now made
the first factor will as before be independent of n. Over the
range of integration where the integral is significant their
product can be removed from under the integral sign giving
V (t) = (n)"1/2 (nat)*"1/4 cos (2Vnat -
exp
(a2 * 2b)t
2a _
exp
(2d + b2 + 2ac)t2
P
2a2 n
% (u)"1/Z (nat)"1/4 cos (2vnat - $)
e
(a + 2b )t
2a
, (2d + b2 + 2ac)t2
1 + J 5 1 * •••
2,eT n _J
which is the equation (9) of Part I.
Band Pass Case - Impulsive- Input
For simplicity let it be assumed that the gain charac-
teristic A(u) has only one absolute maximum at u> = wQ on the
positive frequency range and that this is a second order maximum.
- 26 -
The response Vn(t) can always be written in the form
(co )
A(wo,n f n log H^-r inB(u) + iut )
Vn(t) = — Re Jo en l0* TU^f ♦ dw).
In this form, Vn(t) can again be interpreted as being proportional
to the sum of an infinite number of complex waves of amplitude
with varying complex phase* given by
cp(w,t,n) «= nB(o)) + wt.
With this interpretation it is clear that the maximum contri-
bution to Vn(t)^will be given by those frequencies- in the
neighborhood of u> , where uQ satisfies Ar(w) = 0 and at values
of the time t near t at which the phase function, <p(u>,t,n)
is stationary for the maximum frequency i»Q. Thus tQ is given
by
t0 = .nBM«0).
Since
A(w0) ^ 0 and A«(wpj = 0
♦"Phase" as used here differs from the way it is normally used
in engineering.
27 -
one can write for a suitable small neighbothood of wQ
If we retain only the first term of this expansion, then for a
suitably restricted neighborhood of wQt one has
n
e
n log A(uQ
"TEC
A(u>0)
nA"(u>o) (u _ u ,.:
Similarly, for u sufficiently near o)Q
Bw(co0) 2
(23) B(o>) = B(coQ) + B»(w0)(" ~ «0)'* — g <w " V *
Henceforth for simplicity, we shall write
A = A(co0), A" = A"(wo), B = B(w0), B» = B»(«0),
B" = Bw(cjq)
If these approximations are valid in the neighborhood,
(uQ - A, wQ + A it follows that
vn(t)
(
iRe (
f
A(u>)n e^nB(w) + Wt:d(,
Wo+A_J
♦ A
u)Q+A
u>o-A
exp
nAn
(W - a) )2 + i[nB + nB» (w - (DQ)
- 28
Since [A(u>)]n — 0 as n — oo , except near u = wq, it follows as
before that the sum of the bracketed integrals can be made
negligibly small in comparison with the remaining one if n is
taken sufficiently large. Recalling that
t = -nB'CO
o o
the remaining integral can be written as
Tn(t) = | Re Un e1^ ♦ -tl
,u)o+A r „
exp M11 (w "^o1 + i(t -to)(a) -°)o)
inB"
)
dw)
)
Again the finite limits of integration can be replaced by - go
and oo since » for large n,
I*- (--.-„)'
e
will be small except in the immediate neighborhood of u .
If one sets
p . -n (£ * oB") .
p2 = i2(w - wo) ; g - t tQ
then the remaining integral can be recognized as pair No. 710.0
of the Campbell and Foster Tables.
Then one finds
Vn(t) = —372" Re {{ An expCinB+io)0t3 exp [-(t-tQ)2]
2n°/& (
( VP
4p
The result is equivalent to that given by equation (11)
of part I. If A(cjQ) is greater than 1, it is thus seen that the
response will have a maximum value that builds up very rapidly
as n increases and would eventually force any system involving
vacuum tubes to overload.
It should be remarked that the above approximation
to the gain could only be expected to be a reasonable one for
fairly large values of n, since it represents a usually un-
symmetric gain characteristic by a symmetric function. A better
or second approximation can be obtained by keeping the second
term of the expansion of the logarithm in (21), and then tak-
ing the first term of the expansion of
(U) - 0) )' .
e
This yields
The addition of the second term in the above ex-
pression gives rise to an additional term in Vn(t), provided
that the same phase approximation (23) is retained. The
resulting V (t) is similar to (11) but the new envelope con-
sists of the old envelope plus nA"/6A times the third deriva-
tive of the old envelope. The modulated frequency remains
the same but the phase is changed in a complicated manner.
(Compare- pair 710.3 of the Campbell and Foster tables).
Unit Step Input
In this case one can write
Vn(t) = - Re
oo
i[nB/u) + g]
(I)
As before the only significant frequencies are in the neighbor-
hood of a) = to and near this point the 1_ in the denominator
can be taken out of the integral as l/w" provided u>Q i 0. Thus
the result will be same as for the impulsive input apart from
the factor l/wQ if one makes nB(u>) - n/2 correspond to nB(u>)
in (11).
Low-Pass Case
It is clear that the analysis for this case in which
the equation A'(") = 0 is satisfied for w = 0 can be carried
through in exactly the same manner as the band-pass case treated
previously. The resulting answer is capable of simplification,
however, if it is recalled that B(w) for any physical network
is an odd function of This forces both B(0) and B,f(0) to
be zero. The resulting formulae then become
a) Impulsive Input
b) Unit Step Input
(24)
A(0)n e W A(0)
2n A"(Cfr
Tt
3/2
vn(t)
A(o)
n
3/2 /2nA' Ha)
n J A(Gj
,t
(-(t-tQ)2A(o))
exp j 2nA"(») jdt'
31 -
This last expression involves an integral since it
is necessary to eliminate the pole at zero where A(w) has its
maximum. This can be done by differentiating Vn(t) with res-
pect to t, finding the aysmptotic formula for V^(t) as before
and then integrating to obtain (24) •
Hamy*s Expansions in the Band-Pass Case
The type of asymptotic expansions so far given for
the band-pass case were explicitly designed to represent Vn(t)
in the neighborhood of t = t where Vn(t) is a maximum. They
could in no sense be considered the true asymptotic expansions
for values t« t or-t» t . In particular their derivation
o o
depended upon the fact that the 'time of maximum response was
related to the number of four terminal networks by means of
the equation
t0=-nB'(wo),
so that as n — oo , tQ — oo .
Other types of expansion are clearly possible.
Two obvious alternatives are:
(1) Those valid for fixed n as t — oo ;
(2) Those valid for fixed t as n co .
The first of these will not be considered here since
they are of little interest as all of the four terminal networks -
have been assumed to be absolutely stable. The interested reader
is referred to the book by Doetsch on Laplace Transformations
for expansions of this type.
Since the second type of expansion is of interest
here and is not to be found in most of the standard reference
works it will be discussed here briefly.
In a classic paper, M. Hamy* derived general ex-
pansions of this type for complex integrals of the form
J f(z) <pn(z)dz
♦journal de Mathematique, vol. 4, 6th series, 1908, page 203.
under a variety of hypotheses on f(z) and <p( z) . These condi-
tions include the case where qr(z) has a saddle point given
by the solution of tp*(z) =0 and the result of this case is a
generalization of the often-used theorem of Fowler which one
finds in his book on statistical mechanics under the title of
the saddle point method.
More to the point, they also include the case
where cp(z) has one or more maxima on the path of integration
at which <p*(z) =0 provided that f(z) admits a Taylor series
expansion about these points. In particular, then, if one
considers t as a fixed parameter 'they apply to the integral
of equation (1), with c = 0 and <p( z) = y(p); f(z) = ePtvQ(p).
In terms of our notation, one finds that:
(a) for an impulsive input with gain maxima at <*) = wQ
2An(cO x
VtJ ~ nB'(a>°) COS rV + n B(u,o):i + term in ^ *
(b) for a unit step input with gain maxima at w = uQ f 0.
2An(w ) ,
Vn(t) ?a COS [V + nB^o]^ + termS in — '
■ v o' o n
It is interesting to note that these formula indicate
a dependence upon 1/n instead of 1/Vn as in the case of the
previous expansion. These formulae can be thought of as repre-
senting the response in the band-pass case for any fixed t,
t« tQ.
33
Appendix I
■
Certain remarks of Aueral Winter* on the justification
of the principle of stationary phase are pertinent enough to
the above discussion to bear repetition here. In order for the
integral
(25)
f(x) e^(x,dx
to be asumptotically represented as p — oo , by the formula
(Cf. Lamb, Hydrodynamics p 395)
(26) a ^J^ToT . e irP9(a)±inJ
. y|pltp"(a)l
where cp'(a) ■ 0 and where the upper or lower sign is to be
taken according as <p"(a) is positive or negative, it is
evident that two things are sufficient.
(1) The contribution to the integral outside a small interval
around the stationary value a of <p(a) must decrease more
rapidly as a function of p than the one obtained in the
neighborhood of a;
(2) The asymptotic formula given above must adequately re-
present the behavior of the contribution to the integral
from the neighborhood of. the stationary value a.
Now, if, on any closed interval I, <p*(x) is continuous
and has no zeros, and if <p(x) is strictly monotone in this inter-
val, then z = <p(x) can be introduced as a variable of integration
on that interval, transforming S into
* Method of Stationary Phase Journal of Math. & Physics,
vol 24, no 3-4 - 1945
- 34 -
f(x) e^(x) dx
f [^(zJJ eipz dz
If, in addition to the above, <p(x) and tpf,(x) are continuous
and if f(x) and'f'(x) exist and are continuous, this last
integral can be integrated by parts, giving
S =
| fr^uneip2j
Ip
{
)
1
ip
e±PZ A fCT_i(z)]dz
-1,
and showing that on any such interval I,
S=0(I).
Thus, condition (1) will be satisfied if, in the
neighborhood of the stationary
the integral is greater than
point
o(I).
a, the contribution to
This is clearly the case when the asymptotic formula
(26) is valid, since there the dependences on p is as 1/vp.
it can be shown that (26) is valid whenever
-1
tp(ct) = 0, <ptf(a) f 0 and <p« • (x) and f|>
are of bounded variation in the neighborhood of the stationary
value. Thus, to recapitulate, under these conditions, the
maximum contribution comes from the stationary point and depends
on p as l/vpt while the points which are not near the stationary
point contribute terms depending upon p only as l/p ,
To conclude this brief appendix, it should be remarked
that Winter gives an extension of (10) which is valid under
the same condition of f[tp~l(z)] if the first n derivatives of
<p(x) vanish at some point a while cpn+1(x) does not. These results
could be used to extend the treatment of the high-pass case
given above to the cases in whion a2 + 2b = 0, etc.
C. L. DOLPH
C. E. SHANNON
Att.
B-392415 to 392428
FIG. 3
8A-392.4-I5
*
<\1
ol
t
<0
'— (OOI=U)%
— (0S=U),1.
125 db-
loodbjo*
-
1
•
-
•
[am]
:
■
ST
1 APPRC
)x.-y /
5 •
• \
\\
\ * \ \
—f
/
1
\ \2
!NDAPPR0X.
\
VVTAPPR0X.
J
*
[AU»]
*
/ 1
>-»
T APPRO
X.
\
f
FIG. 16
"» A
Electronic Methods in Telephone Switching
C. E. Shannon
In the recent development of electronic digital computing machines various new
tubes and other electronic devices have been designed which may be of use in
machine switching. In particular the "selectron" tube developed by R. C. A. and the
mercury acoustic delay tank provide large cheap memory devices in which information
can be registered or read off in electronic time intervals (of the order of
microseconds). Since one of the chief functions of the relays and switches in a
telephone exchange is that of memory (e.g. the relays remember which calling and
called lines should be connected together) it is worth while considering the possibility
of using such tubes to replace ordinary electro-mechanical switching equipment.
Suppose we have an exchange (or set of exchanges) serving n subscribers and that
the exchange can handle a peak load of m simultaneous conversations. These may be
between any m pairs of the subscribers. Thus the exchange must be capable of
assuming as many different states as there are of selecting m pairs of objects from n .
This can be done in
n\
ml 2m(n - 2m)!
different ways. For n and m large the logarithm of this is approximately 2m log n .
If the logarithm is to the base ten then this is the required memory capacity of the
exchange measured in decimal digits. If the logarithmic base is two the units are
binary digits. A single two-position relay has a capacity of log 2 units (one binary
digit or .30103 decimal digits), while 5 relays have S log 2 units. A 10 x 10 crossbar
switch has a capacity of 10 log 10, while a single commutator on a panel has capacity
log r , where r is the number of vertical positions of the brushes. Hence the number
of relays required for a pure relay exchange would be
2m log n
log 2 '
the number of 10 x 10 crossbars would be
2m log n
10 log 10 '
etc. To these estimates must be added the losses due to inefficient use of the memory
and also the memory of equipment used for functions other than merely remembering
which connections are being held at a given time.
An ordinary relay is capable of remembering (by a holding circuit) one binary
digit. A pair of vacuum tubes in a flip-flop circuit has the same memory capacity.
The cost of these is of comparable magnitude, and thus if one designed an electronic
telephone exchange by merely changing relays to equivalent vacuum tube circuits the
chief advantage of the electronic circuit would be one of speed, an improvement of
order 103. In many cases this could produce a reduction of cost since frequently many
identical units of a certain type must be supplied because the individual units are slow.
This is apt to be the case with units which are associated with the beginning or end of
calls but need not be used during the conversation. On the other hand equipment to
be used throughout the call would offer less advantage under this tube for relay
replacement since the expected duration of calls is long compared to electronic times.
The newer electronic memory devices, however, change this picture considerably.
A selectron tube (when these tubes are in production) may be expected to cost $100 or
less depending on the demand. It is capable of holding 4096 binary digits, giving a
cost per binary digit of the order of 2.5 cents, while the cost of the equivalent relay
may be of the order of 2.5 dollars. Mercury delay lines can store information at a
comparable cost. Thus it is not impossible that a reduction of the order 100 to 1 in
switching equipment cost might be possible by the use of electronic devices, even in
the parts where information must be stored for long periods of time.
An indication of how such tubes may be used is given in the attached figure.
Fig. 1 is a block diagram of a simplified exchange. The calling parties are connected
to an electronic commutator which samples the speech signals periodically and puts
the various lines in the time division multiplex. The called parties are also connected
in time division multiplex to a single channel by means of an electronic commutator
or distributor. The function of the middle part is to rearrange the samples in such a
way as to provide any desired interconnection between calling and called parties. This
is done by dividing the sampling period into two equal parts. During the first half the
signal plate of the upper selectron is connected by gate 1 into the calling line
multiplex channel. Its windows are caused to open in sequence. Thus at the end of
the first half-cycle the first samples of all the incoming channels have been written on
the face of the tube in their regular order. During the second half-cycle gates 1 and 3
are closed and gates 2 and 4 are opened. Thus the output of the selectron is fed into
the called line multiplex and the windows of the selectron are controlled by the other
selectron tube 2. This tube has registered in a suitable notation the numbers of the
called line desired by the calling line. The windows of this tube are opened
sequentially by the cycling unit and the numbers registered there control the windows
on tube 1 allowing the sample from calling channel 1 to go into the proper place in
the called line TDM.
By a more elaborate system it is possible to make use of the fact that only a small
fraction of the lines will be busy at a given time, as is done in ordinary relay
switching. This can be achieved by only supplying enough places in the distributors
for the peak load. When a call originates the calling and called parties are assigned
idle spaces in the distributor. The place assigned to the called party is registered in
the selectron register corresponding to the place assigned to the calling party.
Some Generalizations of the Sampling Theorem
We have seen that a function of time f(t) containing
no frequencies over W cycles per second can be described by-
giving its value at Nyquist intervals (spaced ^ seconds apart).
It can be reconstructed from these samples using the basic
functions sin 2nWt/2nWt , together with the same function shifted
by integer numbers of Nyquist intervals. We now consider some
generalizations of this result.
In the first place the particular function
sin 2nWt/2nWt is by no means necessary for the reconstruction.
In fact any function cp(t) which contains all frequencies up to
W is satisfactory. More precisely the spectrum of cp(t) should
not vanish over any finite set of frequencies (set of positive
measure) up to W. If <p(t) satisfies this condition the original
function f (t) can be reconstructed using cp(t) and its shifted
images <p(t + ~) . That is coefficients a£ can be found such
that
°° K
f (t) = 2 aK q>(t + f») .
j[ — _ 00 *»• *w
In general the coefficients are not found as easily as in the
special case where cp(t) = sin 2nWt/2nWt (when they are merely
the values of f (t) at the Nyquist points) but they may be
calculated as follows. Let F(w) be the spectrum of f (t) and
$((0) be the spectrum of cp(t). Expand the function F((d)/$(co) in
a Fourier series using -W to 4W as the fundamental interval.
- 2 -
Thus
.ko)
F(cj) _ T _ _ 2W
ft(u) ~ L SK 6
°r £&
F(w) = Z aK 0>(oj) e 2W .
Taking the transform of the equation we obtain the desired
expansion
f(t) = 2 aK cp(t + !y) .
The coefficients in the expansion can therefore be determined as
the coefficient of a Fourier series expansion of F(w)/<I>(<d) . In
general the function cp(t + ^) will not form an orthogonal set
and therefore the energy in f(t) cannot be found from 2 aK as it
was in the simple case where «p(t) = sin 2nWt/2nWt.
A physical method of performing this expansion can
also be given. Consider a filter which gives the output
sin 2nWt/2nWt when the input is <p(t) . If the function f(t) is
passed through this filter the amplitudes of the output at
Nyquist intervals will be the desired coefficients. This is
true since this output can be considered as expanded in the
f mictions sin 2TrWt/2rrWt with the amplitudes as coefficients,
and the inverse filter would restore the original function and
change each of these functions with cp(t) at the corresponding
Nyquist point.
A function f (t) can also be determined from a knowledge
of its value and derivative at alternate Nyquist points:
We have here the same number of measurements per second, 2W,
but half of these are ordinates of f(t) and half are derivatives.
The reconstruction of f(t) from these values can be carried out
simply using two basic functions:
_ ( + x _ sin2 nWt
Tllt) '"wmT
m x . sin2 rrWt
*2{t) ~ (nWt) *
Both of these lie entirely within the band W and has the
property that it and its first derivative vanish at alternate
Nyquist points (except for t =0 where the function is 1 and
its first derivative 0) . Likewise cp2 and cp£ vanish at alternate
Nyquist points except at t = 0 where cp2 = 0 and (p2 = 1. Thus
we can fit the ordinates of the original function f (t) using ^
and its shifted images (shifted by two Nyquist intervals). The
derivaties of f(t) are fitted using cp2 and its shifted images.
Due to the vanishing of these functions none of the fittings
interfere. The function constructed by this process must lie
within the band and have the same values and derivatives as the
original function f (t) at alternate Nyquist points. That there
is only one such function can be shown by arguments similar to
those used in the basic sampling theorem, generalized by break-
ing down the spectrum into an even and an odd part.
- 4 -
It is possible to carry this further and determine a
function from knowledge of its value and first (n - 1)
derivative at points separated n Nyquist intervals apart. In
this case the basic functions are
sin11 (Sgfc)
*1 =
n
(2nWtxn
1 n '
_ sinn (agt)
1 n '
s.nn (2^t}
n-2
/2nWt%
K~ n~";
rn 2nWt
n
These functions possess the properties:
1. They lie within the band W.
2. They vanish at t = |g K = ± 1, ± 2, ... ,
(that is at n-th Nyquist points) and also their
1st, 2nd, (n-1) derivatives.
3. At t = 0, all derivatives of cp_ vanish except the s-th
s
derivative which is 1.
Consequently we can reconstruct f(t) by using <pg to
adjust the s derivatives (s = 0, 1, n-1) and these adjust-
ments will not interfere.
The functions q; and their spectra are shown in Fig. 1
s
for the cases n = 1, 2, 3*
C. E. SHANNON
Att.
e 1
March 4, 194S
UVf-
The Normal Ergodic Ensembles of Functions
Among the possible probability distributions in a one-
dimensional space certain ones are of special importance because
of their simple mathematical properties and frequent occurrence
in the physical world. The most important of these is the
normal or Gaussian distribution with a density function:
1/J2R a exp £ | x2/<^
In an n-dimensional space the most important distribution func-
tion is an n-dimensional generalization of this, the n-
dimensional normal distribution:
i 5 r - -i
^IV<a»r e*P ai;j xi xj
Here a^ is the associated quadratic form and the
determinant of this form. This form is positive definite and
the surfaces of the constant probability are found by setting
the argument of the exponential function equal to a constant
2 H . x± Xj = C
and are therefore coaxial elipsoids in the space. The direc-
tions of the axes of this elipsoid are those of the eigen-
vectors of the form a^ and the lengths are inversely proportional
to the corresponding eigenvalues. By a rotation of axes the new
coordinate system can be lined up with these directions and the
distribution function reduced to
- 2 -
n
{X1» #oe» V (2n) exp - | Z 5^ y*
where the \± are the (positive) eigenvalues and the y^^ are the
new coordinates. The form a^j being positive definite has an
inverse A^j which is also positive definite with eigenvalues
The properties of the n-dimension normal distribution
which give it particular mathematical importance are the
following.
1. If x± and y± are two chance vector variables, which
are independent and distributed according to n-dimensional
normal distributions with quadratic forms a^ and b^. (inverses
A^j and B^) , then the chance vector variable = x± + Ji is
also distributed normally with the form c^y whose inverse is
Cij = fij + Bij°
2. If x is a normally distributed vector variable and
yj = 2 r^j x^ is a vector variable which is a linear operation
on (possibly of smaller dimension thann) then yj is normally
distributed with the inverse form
= Z r, r^ Ast •
ij s,t is jt
,3. Under certain quite broad conditions the resultant of
a large number of small chance vector variables, x® (s = 1, 2, N)
with arbitrary distribution functions, which are independent
gives a normal distribution for
3 -
with
providing no term of the sum contributes more than a small
fraction to any B.
4, If the a priori probabilities for each of two
independent vectors xi and y± are both normal, the a posteriori
probability of x^ when we know the sum x± + 7^ — ^ is
normally distributed (about a displaced mean, however).
5. The mean value of x± x^ for x± normal is given by
xi xj = Aij *
Among the many possible ergodic ensembles of functions
fa(t) there is also a certain class of particular mathematical
and physical importance. This class of ensembles can be con-
sidered a generalization of the n-dimensional normal distribution
to infinite dimensional function spaces ergodic under trans-
lations in time. We shall call these normal ergodic ensembles
of functions. They are completely specified by giving their
power spectra P(w) or their autocorrelation functions A(t)
which are the Fourier transforms of the power spectra. The
normal ergodic ensembles can be defined in various ways. They
occur physically when we pass a thermal noise through a filter,
shaping the power spectrum to P(w) = |l(w)|2, T(«) being the
admittance of the filter.
In the literature on noise these ensembles are often
treated in a loose somewhat illogical fashion by using either
of two "representations." The first representation is
oo
2 |P(nAf)Af cos (nAft + 6 ) .
n=0
The 6n are all uniformly and independently distributed over all
values from 0 to 2n. This representation amounts to making the
noise the sum of a large number of small sinusoidal waves with
random phases, and amplitudes adjusted to give the proper power
density in any small frequency range. The frequency increment
between adjacent waves Af is supposedly very small and in use
one evaluates any desired statistic of this set of functions and
determines the limit approached by this statistic as Af - 0.
This limit is taken to be the desired statistic of the normal
ergodic ensemble. The second representation is similar but uses
normally distributed amplitudes an whose variance cr is equal
to P(«)
2 aBAf cos (nAft + 6J .
Actually these "representations" will not give the
correct answer in all cases. For example, if we ask what
fraction of the functions in the representation ensemble r^
are periodic, we find that all are, so the probability is unity,
and the limit as Af 0 is also therefore unity, while almost
none of the functions in the ergodic normal ensemble are periodic
However it can be shown that if we restrict ourselves to what we
have called physical statistics, the answer will be identical;
the normal ergodic ensemble is the physical limit of either of
the above ensembles as Af -* 0,
A more logical definition of a normal ergodic ensemble
can be given as follows. We divide the frequency range up into
unit intervals and construct the sequence of "flat" ensembles
for these intervals. These will be given by
2 a„ sin nt •
n
These ensembles are passed through shaping filters to give the
proper power spectrum in the interval in question and the results
added.
The normal ergodic ensembles have properties analogous
to the n-dimensional normal distributions which we have given.
We have
Theorem: The sum of two functions fQ(t) + gp(t) where f and g
are from normal ergodic ensembles with spectra
and P2 is normal ergodic with spectrum P1 + P2.
Theorem: The output of any linear invariant transducer driven
by a normal ergodic ensemble is normal ergodic with
spectrum |Y(«)| P(w).
Theorem: Any finite dimensional linear operation on a normal
ergodic ensemble gives a normally distributed vector.
March 15, 194$
C. E. SHANNON
p
Systems Which Approach the Ideal as g — 00
We will show that it is possible to construct an
p
instantaneous system for sufficiently large - for transmitting
a sequence of binary digits such that the frequency of errors
is arbitrarily small and the power required only slightly
greater in db than the ideal for the corrected rate of trans-
mission. More precisely we have the
Theorem: Given any e>0 and 8 > 0 we can transmit binary digits
on an instantaneous basis with frequency of errors
< e and corrected rate of transmission
R > W log -jl + (1 - 5) | J
The system to be used is of PCM type with an extremely large
number of amplitude levels. Let there be 2s levels, and number
them with a binary notation, but in the Stibitz type code, so
that only one binary digit changes on going to an adjacent
level. If we are in error by d levels, at most d binary digits
of the s will be incorrect. If there are many levels in the a
distance U/I) of the noise the expected number of errors will
be approximately
2
•p
We take £ large enough so that es > a.
Thus the frequence of
errors in our final result will be < e. The levels should not
be spaced uniformly but according to the density of a normal
distribution. If this is done the received signal will be
nearly Gaussian with a — J? + N and the corrected rate of
transmission
H > W log 1 + (1 - 5) |
C. £• SHANNON
March 29, 194$
DO
Theorems on Statistical Socuencea
If It la poaalbla to go froa any state with P > 0
to any other alone a path of probability p > 0, tha system la
argodlo and tha atrong law of large nuabera can be applied.
Thus the number of tines a given path p^j in the network la
traversed in a long sequence of length K is about proportional
to the probability of being at i and then chosaing this path,
P.p. 4K. If N is larne enough the probability of percentage
error i 6 In thia la less than c so that for all but a aet of
email probability the actual numbers lie within the limits
Hence the probability that nearly all sequences lie within
limits ± ft is given by
and lfijLJfc lB limited by
• I(PlPiJ ± |)log PiJ
or
| ^ - * PiPij log Pijj < *
Thus we have I
Theorem For almost all sequences
2
Um ' to*-* • H • - i PiPij log Pjj
where p is the probability of the sequence baring the block
of length L starting at the first position.
Thus for all but a set of blocks of probability < «
and for B large enough
(H - $)«<- log p < (H ♦ n)H
*.p(H - q)H. < — p log p < P(H ♦ n)M
where «e hare aummed orer all but the set of small probability
i. p(H ♦ a.)I £ (I ♦ sJM * P S W * *>*
and * p(H - q)* (H - q)I * P U - q> ■ U - •>
For the sot of oaall probability
•I p log p
^ log ^
since this is maximised f or ip • t by making all p equal, and
the number of them 1 -Jj • But this is dominated by
• l P log p| £ |«W lo« |
1 •»
with « as snail as d« sired for sufficiently large K and small c.
Henee this does not affect the sua ia the limit as I -* oo and
we have the
Theorems
Lia £ I p (Bt) log p(BL) - H
I - oo
where plB^ is ths probability of block B^ of length L, and
the sua is ovsr all possible blocks.
We now prove the
Theorem H • - i. p(BijSj) log PB^8!*
« Lie -* q(BtSj) log qB (3^)
UBHoe
where p(Blt8j) is the probability of block Bi followed by 8^ and
PB^Sj) is the conditional probability of 8j after the block Bt
ia known to occur. q(Blt8j) in the probability when B^ ia
computed on the basis of any initial state probabilities, not
necessarily the proper ones and q^Sj) the corresponding condi-
tional probabilities.
The first equality is trus since we may summ first on
all B± leading to a given state K. *he terms q,B^CS ^) are then
all equal to Pjj and the terse qlB^j) sum to PKPjj gives the
desired result.
If the q»s are used, the q^lSj^ are still p^ where
I It the stat* In which B± ends.
* qU-.S.) • pkj i. P(B1)
since any Initial distribution tends toward equilibrium.
We hare shown that apart from a set of small probability,
the probabilities of blocks of length L lie within the limits
-(H - S)M .(H ♦ S)M
* < S> < 2
where S can be made small by taking B large enough. Let the
maximum number of blocks of length M when we delete a set of
measure • be Qg(«). Thent
I p - (1 - t)
remaining
set
Q (I) p - Q (M) 2*lH * *)M
t max c
log 0tl«) > (H ♦ 6)M ♦ log(l - t)
Hence
log 0 (li)
Lim S - %U) £ 8
I -CO II
Similarly
1 > I p > GC(K) pj^B
frota which we obtain
log 0
and
•U) * H
Hence we hare
Theoremi vU) - » 'or t J1 0, 1
Tha fact that for large M nearly all blocks hare a
probability limited by
ri°JLE ♦ s
< *
does not imply that those probabilities approach equality.
In fact they will generally diverge from one another but the
db range becomes small compared to K, eince for p's satisfying
6
this inequality
*»« Pmax lQg Pmln m log _
I II 1
It it possible to show, however, that thert exists among the
blocks of length It a subset, all of equal probability which
hare the sane growth with K as the set including all blocks
except those of small probability totaling less than t: namely ,
the subset will contain more than 2*H " ^N eleoents with 5
arbitrarily small.
Consider all blocks beginning in a given state, say
state 1, and ending in this state. Let these blocks B1
fig*... have lengths n^, n2,...., t^, .... and conditional
probabilities p^, p2, pat ..... when we start from state 1.
We first prove
-1
Theorem: I p^n^ • p^
The first part is true since the ergodic character of the system
makes the Inverse frequency of occurrence of state 1, equal
to the mean distance between its occurrences, I Pi*i« The
second part is true since almost all blocks of large length N
have approximated the proper frequency of each B^.
Now we return to the construction of a subset of growth
(H . 6)1
2 all of equal probability* Let us choose integers
ai at close as possible to
and construct sequences with of the block B± . The number
of block* is then
and the number of sequences:
» <- Pt log pt
The growth Is then in term* of symbols
lag* . , * 4* .
This proves the following!
Theorems Given I > 0 there exists a set of M blocks of length X
(when H is sufficiently large) such that
AS - ft)S
k> a
and each block has the same probability, and starts and ends in
the eeme state, which can be chosen arbitrarily*
In case the system is not ergodle but made up of a
finite number of ergodle systems:
r - X ctrt
each rt will hare a rate Hi which we may assume arrengee in a
now increasing sequence
The function %{•) then bieoMi a decreasing atep function in the
manner Indicated by the following I
Theorem! In the case conaidered
K-l
?(c) • in the internal la^ <i< j ^
For if c it in the range indicated we oust take a set
of poaitiTe probabilities froa at least one of r1# ...» rj.
This gives a growth of type
at least, and can be limited to this by choosing all sequences
The quantity
will be called the man statistical rata for the system.
C. E. SHAM UGH
April 26, 194*
Samples of Statistical English
C B S^a**o*
A number of samples of statistical English including
probability structure out to four, words are given below. These
were constructed by starting off with three words from a book.
These three words are shown to someone who fits them in a
reasonable English sentence and writes down the word following
the three. The first word is then covered up and the process
repeated with a different person, etc. If the imagined sentence
ends after the added word, the person writing the word adds a
period. For samples bearing a title the participants were told
that this was the subject dealt with. These samples may be
compared with those in "A Mathematical Theory of Communication"
where less statistical structure is included.
The samples given here were obtained for the most
part, with the aid of J. R. Pierce, B. McMillan, C. C. Cutler
and W. E. Mathews, A few of the samples were obtained from
other sources (contemporary literature, etc.) and are included
for comparison. The reader may try his skill at guessing which
are statistically constructed. The true sources are given at
the end.
1. This was the first. The second time it happened without
his approval. Nevertheless it cannot be done. It could
hardly have been the only living veteran of the foreign
power had stated that never more could happen. Conse-
quently people seldom try it.
2. John now disported a fine new hat. I paid plenty for the
food. When cooked asparagus has a delicious flavor sug-
gesting apples. If anyone wants my wife or any other
physicist would not believe my own eyes. I would believe
my own word.
3. That was a relief whenever you be let your mind go free
who knows if that pork chop I took with my cup of tea
after was quite good with the heat I couldn*t smell any-
thing off it ITm sure that queer looking man in the
4. In a few days was the minimum amount of money remaining to
the end. However everyone knows the meaning implied. It
was true when Cutler says that we should proceed care-
fully. When you love yourself too much., The woman who
accosted
5. Fourscore and twenty years passed before we could meet them
that isn't already done should have been a good son is
going fast according to the teacher of his ability. His
intelligence sufficed for the time. This cannot change
much.
- 2 -
6. Even the killing was atrociously perpretated by the
cruelest treatment that a small boy jumped over the hedge
and buried her. A grave fault of many approaches to the
furthermost reaches of the state. Politics and business
are becoming lost to the .
7. It is an Italian ox mouth dish. The only thing in the
room is worms. I am the director of the seminar. In an
evolving hemisphere. C'est Monsieur Jardin. I am a
patient. Oh my dear Plapsen, you are my dearest Klapsen.
He took it with many other matters are more apparent if
they think so. Is there a reason for supposing that
most people don't. Nevertheless sex is absolutely neces-
sary as though the electron diffraction camera plate up
on the top surface of
9. Fifteen years before the mast, he ever had eaten. Try
it and see, I believe that whatever arises a fund has
been accumulated sufficiently in the near future holds
m« ™™ * * ■ • • ■ ...
many surprises. No man can judge his actions by his wife
Susie .
10. I forget whether he went on and on. Finally he stipulated
that this must stop immediately after this. The last time
I saw him when she lived. It "happened one frosty look of
trees waving gracefully against the wall. You never can
11. When I bought my wife a long time ago. I knew that it
wasn't faster when he didn't eat or drink a toast to
John Doe, otherwise known as McMillan's theorem.
Whatever the nature of Christ's teachings. Go far into
12. McMillan's Theorem
McMillan's theorem states that whenever electrons diffuse
in vacua. Conversely impurities of a cathode. No sub-
stitution of variables in the equation relating these
quantities. Functions relating hypergeometric series
with confluent terms converging to limits uniformly
expanding rationally to represent any function.
13 • House Cleaning
First empty the furniture of the master bedroom and bath.
Toilets are to be washed after polishing doorknobs the
rest of the room. Washing windows semi-annually is to be
taken by small aids such as husbands are prone to omit
- 3 -
14. Epiminondas
Epiminondas was one who was powerful especially on land
and sea. He was the leader of great fleet maneuvers and
open sea battles against Pelopidas but had been struck on
the head during the second Punic war because of the wreck
of an armored frigate.
15. Salaries
Money isn't everything. However, we need considerably
more incentive to produce efficiently. On the other hand
too little and too late to suggest a raise v/ithout a reason
for remuneration obviously less than they need although
they really are extremely meager.
16. Murder Story
When I killed her I stabbed Claude between his powerful
jaws clamped cruelly together. Screaming loudly despite
fatal consequences in the struggle for life began ebbing
as he coughed hallowly spitting blood from his ears.
Burial seemed unnecessary since further division was
necessary.
The sources are: 3, from "Ulysses" by James Joyce,
page 748; 7 and 14 are the conversation and writings of two
schizophrenic patients (quoted from Bleuler, "A Textbook of
Psychiatry"). All others constructed by statistical means.
„_C, ..-£,. -SHANNON
"June 11, 1948
The Department of Defense
H DEVELOPMENT
Washington 25, D. C.
Prepared by
THE PANEL OF COMMUNICATIONS OF
THE COMMITTEE ON ELECTRONICS
Approved:
Chairman
5. SIGNIFICANCE AND APPLICATION
C. E. Shannon
Bell Telephone Laboratories
Murray Hill, N. T.
1. Introduction.
A general communication system is shown in Figure 3. An information source
produces a message. This is encoded in a transmitter to produce a signal suitable for
transmission over the channel. During transmission the signal may be perturbed by
noise. The perturbed signal is decoded or demodulated at the receiver to recover, as
well as possible, the original message.
The situation is roughly analogous to a transportation system for transporting physical
goods from one point to another. We can imagine, for example, a lumber mill producing
lumber at an average rate of R cubic feet per second and a conveyor system capable of
transporting C cubic feet per second. If R is greater than C the full output of the mill
cannot possibly be carried on the conveyor. On the other hand, if R is less than or equal
to C it may or may not be possible, depending on whether the lumber can be efficiently
packed in the available space of the conveyer. However, if we allow ourselves to saw
the lumber up into suitable sizes and shapes we can always approach 100 per cent effi-
ciency in packing. In this case we must, of course, supply a carpenter shop at the other
end of the conveyor to reassemble the lumber in its original form before passing it on
If the analogy is sound we might hope to define two parameters R and C associated
with an information source and a channel, respectively. R should measure, in some
sense, how much information is produced per second by the source, and C the capacity
of the channel when used in the most efficient manner for transmitting information. We
would expect then that if R ^ C the full output of the source cannot be transmitted satis-
factorily. If R ^ C it should be possible to transmit the output of the source by proper
encoding and decoding at transmitter and receiver. It turns out that it is possible to
define quantities R and C which measure these information rates and capacities and
satisfy the desired relationships. We will attempt to show how this can be done without,
however, giving mathematical proofs of the results.1
2. The Information Source.
The first problem is that of clarifying the nature of "information" and finding a
measure of the rate of production for an information source.
Information involves basically the concept of "choice." An information source
chooses one particular message from a set of possible messages. If there were only
!For mathematical details, see Shannon, C.E., "A Mathematical Theory of Commu-
nication," Bell System Technical Journal. July and October, 1948. See also Shannon, C .E . ,
"Communication in the Presence of Noise," Proceedings of the I.R.E. (Forthcoming).
to the consumer.
14
one possible message there would be no communication problem. The amount of informa-
tion produced by a source must evidently be related to the range of choice available.
The simplest possible choice is a choice from two equally likely possibilities, say
0 or 1. We shall call the corresponding unit of information a binary digit or "bit." A
relay or flip-flop circuit has two possible states and is capable of storing one bit of
information.
A device which chooses at random from 0 or 1 making one choice each second is
considered to be producing information at rate R of one bit per second. Such a source
produces a "message" which is a random sequence of O's and l's.
A choice from say. 32 equally likely possibilities can be considered as a series of five
choices, each from two equally likely possibilities, and, therefore, should correspond to
five bits. More generally, a choice from n equally likely possibilities represent logP
n bits. £
Suppose now that the various possible choices have different probabilities of occur-
rence, say pi, p2, pn. How much information is produced when a choice is made under
these circumstances? One feels intuitively that less "choice" is involved in a device
which chooses between 0 and 1 with probabilities .01 and .99 than in one which chooses
with equal probabilities. In the former case the result is almost sure to be 1.
The following example shows that by proper encoding an average compression can be
obtained by using the probabilities pi, P2, pn. Suppose there are four possible choices
A, B, C, D with probabilities pA = 1/2, pB = 1/4, pc = 1/8, pD = 1/8. If we use a simple
direct code into binary digits:
A = 00 B = 01 C = 10 D = 11,
we use two binary digits per letter. On the other hand, using the following code where
more probable letters are given short codes and less probable letters longer codes, we
obtain an average saving
A=0 B = 10 C = 110 D - 111.
This is a reversible code; the original text can be recovered from the encoded sequences
as is readily verified. With this code we need, on the average, only
(1/2 x 1 + 1/4 x 2 + 1/8 x 3 + 1/8 x 3) = 1 3/4
binary digits per letter. We may say then that a choice with probabilities 1/2, 1/4, 1/8,
1/8 corresponds to 1 3/4 bits of information. If an information source were producing
a sequence of the letters A, B, C, D with these probabilities we could encode it into a
sequence of binary digits in which 1 3/4 binary digits are used on the average for e?.ch
letter of message.
A general analysis of the situation shows that if the letters are chosen with probabili-
ties plf p2, pn then it is possible to encode into binary digits using
H = - 2, Pi log2 Pi
binary digits per letter of message on the average, and there is no method of reversible
encoding using less. This H then is the equivalent number of bits per letter, and, if the
source produces n letters per second, R = nH is the rate of production in bits per second.
16
In the case of English text the statistical structure is more involved. There are the
mricms letter probabilities Pi, but, also, there are statistical influences between nearby
totters For example, the letter T is more often followed by H than by any other letter
a Qis almost invariably followed by U, etc. In such cases there is a more general formula
i for calculating the equivalent number of bits per letter of message. Let pU, 3» ■ s)oe
i Ibe probability in the language of the sequence of letters i, j s. Then we define G„
ft
l:
.V;!i.
m
p(i, j, s) log2 p(i, i, .... s)
where the sum is over-all sequences of letters which are just n letters long J^h which
ouences Gi. Go Gn> ... represents a series of approximations to the desired H which
takes into account mofe and more of the statistical structure as we proceed along the
sequence. The information per letter of message can be defined by the limiting value of
the G's.
H = Lim G
— » oo
n
It can be shown that H has the desired properties; namely, we can encode the messages
from the source into binary digits using H binary digits per letter on the average, and no
method of encoding uses less.
For the English language H has been estimated at roughly 2 bits per letter, taking
account only of the statistical structure out to about 6 or 8 letters.
If the messages produced by the information source are continuous functions of time
ta in speech or television transmission, the situation is much more involved and we will
not discuss it in detail. It is still possible to assign a rate of production of information
In bits per second to such a source, but the rate now depends on other considerations.
With continuous functions as messages, exact reproduction is not generally required and
the rate R depends on the amount and nature of the discrepancy which can be tolerated
between the original and recovered messages. The tolerable discrepancy in turn is
determined by the final destination of the messages. With speech, for example, the toler-
able errors depend on the structure of the human ear and brain.
Although the mathematical problems involved in defining the rate for a continuous
source have been completely solved, it is in practical cases very difficult to estimate R.
The following calculation may be of some interest, however. Suppose we are interested
only in transmitting English speech (no music or other sounds), and the quality require-
ments on reproduction are only that it be intelligible as to meaning. Personal accents,
Inflections, etc., can be lost in the process of transmission. In such a case we could at
least in principle, transmit by the following scheme. A device is constructed at the trans-
mitter which prints the English text corresponding to the spoken words These can be ^
translated into binary digits in the ratio of about two binary digits per letter, or ^x4.D - v
per word. Taking 100 words per minute as a reasonable talking speed we obtain 900 bits
per minute or 15 bits per second as an estimate of the rate for English speech when in-
telligibility is the only fidelity requirement.
3. The Capacity of a Channel.
We now consider the problem of defining the capacity C of a channel for transmitting
Information. Since we have measured the rate of production for an information source in
17
mitted over a given channel?
in some cases the answer Is simple. With a . tele «»J%*£Z ^second,
can send 5n bits per second.
Suppose now that the channel is defined £ fc^j. JJ- ^ Vyclef pTrse^nfwide .
tions of time f(t) which lie within a cer^»^ a series of
It is known that a function of thi^type can be J£j say that such a function
equally spaced sampling points^ seconds apart Thus we may say
has 2W degrees of freedom, or dimensions, per second.
If there is no noise whatever »
Even when there is noise, if we place no ^tjon s ^JgPSSS!SSU
capacity will be infinite for we m **£W2?£tof e« p transmitter
number of different amplitude levels .^^nw^etevres The capacity depends, of
limitation.
The shiest type o, noise is white V^tt'S^K'''
distribution of ampUt^s is Ga**ta, and to a eetrnmr s ilat q 7 ^ tf
into a unit resistance.
The simplest limitation on transmitter power is ^^^S^£%M
SLr«TL£T£K SLrto/eTarametLs W, P, and N,
the capacity C can be calculated. It turns out to be
C = W log2 E-^Ji (bits per second).
P + N
N
different amplitudes at each sample point. In a time T there will be 2TW independent
samples. Thus, there are approximately
( / P + N) 2TW (p + N)TW
M " (V N ) = ( N )
different signal functions of duration T that can be distinguished from one another in spite
of the noise. This corresponds to
18
log2 M = TW log2 P ftN
binary digits in the time T or
C=W log2 P^N
binary digits per second. This formula has a much deeper and more precise signifi-
cance than the above argument would indicate. In fact it can be shown that it is possible,
by properly choosing our signal functions, to transmit W log2 fo^ binary digits per
second with as small a frequency of errors as desired. It is not possible to transmit
binary digits at any higher rate with an arbitrarily small frequency of errors. This
means that the capacity is a sharply defined quantity in spite of the noise. These state-
ments are proved by two different methods. *
The formula for C applies for all values of P/N. Even when P/N is very small, the
average noise power being much greater than the average transmitter power, it is pos-
sible to transmit binary digits at the rate W log2P N with as small a frequency of
errors as desired. In this case log2 (1 +£) is approximated by -£log2 e = 1.443 ^
and we have approximately
C = 1.443
It should be emphasized that it is only possible to transmit at a rate C over a channel
by properly encoding the information. In general, the rate C is only approached as a limit
by using more and more complex encoding and longer and longer delays at both trans-
mitter and receiver. In the white noise case the best encoding is such that the transmitted
signals themselves have the structure of a white noise with power P. The difficulty with
the approximate argument given for that case, and the reason it does not give a sharply
defined capacity, is that the selection of signals is not optional. The distribution of ampli-
tudes is not Gaussian as it should be.
4. Comparison of Ideal and Practical Systems. *
In Figure 4 the curve is the function
% = log (1 +f )
plotted against P/N measured in db. It represents, therefore, the channel capacity per
unit of band with white noise. The circle and points correspond to PCM and PPM systems
used to send a sequence of binary digits and adjusted to give about one error in 1CP binary
digits. In the PCM case the number adjacent to a point represents the number of ampli-
tude levels - 3 for example is a ternary PCM system. In all cases positive and negative
amplitudes are used. The PPM systems are quantized with a discrete set of possible
positions for the pulse, the spacing is ^j, and the number adjacent to a point is the num-
ber of possible positions for a pulse.
The series of points follows a curve of the same shape as the ideal but displaced
horizontally about 8 db. This means that with more involved encoding or modulation sys-
tems a gain of 8 db. in power could be achieved over the system indicated.
See Shannon, C. E., "Mathematical Theory of Communication" and "Communication
in the Presence of Noise."
20
Of course, as one attempts to approach the ideal, the transmitter and receiver re-
quired become more complicated and the delays increase. For these reasons there will
be some point where an economic balance is established between the various factors
It may well be, however, that even at the present time more complex systems would be
justified.
A curious fact illustrating the general misanthropic behaviour of Nature is that at
both extremes of P/N (when we are well outside the practic* ^/^pcMlotaS
in Figure 4 approach more cjosely the ideal curve. At very large P/N *e,f £M pomts
Approach to within 10 log10# = 4.5 db. of the ideal while with very small P/N the PPM
points approach to within 3 db. The relation
C = W log (1
can be regarded as an exchange relation between the parameters W and P/N. Keeping the
ch^el cgacity fixed we can'decrease the bandwidth W provided we ^ease P/N «£-
ficiently. Conversely, an increase in band allows a lower signal-to-noise ratio in the
channel The required P/N in db. is shown in Figure 5 as a function of the band W. It is
assumed here that as we increase W, N increases proportionally:
N = W N0
where N0 is the noise power per cycle of band. It will be noticed that if P/N is large a
reduction of band is very expensive in power. Halving the band roughly doubles the
signal-to-noise ratio in db. that is required.
The channel capacity C can be calculated in many other cases. A general result that
applies in any situation where the average transmitter power is limited to P is that the
channel capacity is bounded by:
WlogL^l^C £W log^
where N, is a parameter called the "entropy power" of the noise. It is defined as the
power ina white noise having the same entropy as the actual noise. N is, as before, the
average noise power.
21
22
REFERENCES
Nyquist, H.
"Certain Factors Affecting Telegraph Speed,'
Bell System Technical Journal, April 1924,
Hartley, R. V. L.
Shannon, C. E.
Toller, W. G.
Wiener, N.
Bailey, R. D., and
Singleton, H. E.
p. 324.
"Certain Topics in Telegraph Transmission
Theory," A.I.E.E. Transcripts, Vol.47,
April 1928, p. 617.
"Transmission of Information," Bell System
Technical Journal, July 1928, p. 535.
"A Mathematical Theory of Communication,"
Bell System Technical Journal, July,
October, 1948.
"Communication in the Presence of Noise,"
Proceedings of the I.R.E. (Forthcoming).
Sc.D. Thesis, Department of Electrical
Engineering, Massachusetts Institute of
Technology, 1948.
The Interpolation, Extrapolation and Smoothing
of Stationary Time Series, NDRC Report
(Forthcoming as a book to be published by
John Wiley and Sons, Inc., New York).
Cybernetics. John Wiley and Sons, Inc.,
New York, 1948.
"Reducing Transmission Bandwidth," Electronics.
August 1948, p. 107.
23
[Ml
Note on Certain Transcendental Numbers
Claude E. Shannon
This note calls attention to a certain class of
numbers that are easily shown to be transcendental but seem
to have escaped previous notice. A typical example is the
number
-2 *
X = 2 *
or more precisely X = ^Lim^Xn, ^n+l = 2 * ^0 = 2* ^ is ^
easily seen that X exists and satisfies the equation X = 2" .
It is known from a conjecture of Hilbert , proved by Gelfond
and by Schneider, that ax is transcendental if a / 0, 1 is
algebraic and x is an algebraic irrational. Nov; X is clearly
not rational, and if we suppose it an algebraic irrational,
it must then be transcendental, a contradiction. Hence it is
transcendental.
More generally let f be a function such that if
x is algebraic and does not belong to a set S, then f(x) is
transcendental. Let g1 and g2 be algebraic functions and
such that x f g1fg2x, xeS. Then the solutions of
are transcendental by a similar argument , using the fact that
g£ is algebraic. If the sequence Xn = (g1fg2)1X0 approaches
a limit X it must be transcendental. Some functions known to
have the property required for f are sin x, ex and JQ(x) , the
exceptional set S consisting of the number 0.
C. E . SHANNON
October 27, 1948
\ '. A CASE OF EFTIC1EHT CGDI83 FOl A BOIST CHAH38L
Consider a di aerate channel with two poeeiMe symbols
0 and 1* Hoise it aeeuaec to affect successive cyrbolB inde-
pendently **nd in such 6 wty that t o probability of a syjabol
bainf, inter, reted correctly at the receiver ie j> » * g 1 wnlealg
the probability of incorrect interpretation io q -
^ 2
ca^city of such & channel is
- e2
Ve e©»us» e very soall and epproximte log (1 ♦ c) by z
2
* e2 (natural units)
In bits .or ayebel, the capacity 1st
C - log*, a
A vary eiaple coda can be oonetruct<*J for this eyatea
to aond a Doquence of random binary dibits at nearly the rata C
with a quite snail frequency of errors | In other wards a code
Wuich la not far fron the ideal* The code is merely to repeat
each binary digit in the oeeeage a large number n of tiasee. At
the roceiver, a group of n is received, end the rajority report
la taken aa the original nessags eynbol.
If the m&mrp eynhol is 0 then a 0fs are trans-itted.
At tilt receiver the n received eynbols will be a -istur© of
0*8 und l»a the number of 0*s present will be distributed ac-
cording to a binonial distribution with p • I *, * and q ■
For large n the binonial distribution is approximately nornal
(and this approximation is especially ^ood when p 5 s close to
i). The exacted nc->*r of O'c is p n, and the standard devia-
tion is;
An error occu*e when the number of rocoivod O'o ie lose than
l.e* when the actual number of cores is p n - § av*iy froo
t;ie ejected nunber. In terras €>f r this iat
*■ - ^ — ^ standard deviations.
Hence the frequency of errors is given by the area of a noma!
curve with otandard deviation equal to unity fron a out to m.
To obtain a frequency of errors 10*3, say, we mist
have a ■ 1*5
n
t
and the rate is -JL. as coopered with the rate 1«.&5 the
2.3
ideal (with essentially zero froquency of errors).
Hovenber IS,
c. s. svjjman
December 6, 1943
Note on Reversing A Discrete Markhoff Process
In "A Mathematical Theory of Communication" a
language was represented by a discrete Markhoff process with
a finite number of possible states. Such a stochastic process
can be represented schematically by means of an oriented linear
graph as in Fig. 1
Consider the question of generating the same language
in reverse; for example, English but read backwards. Can we
always invert a finite state Markhoff process and obtain a
finite state Markhoff process? The answer is "yes" and further-
more the corresponding linear graph has the same topology, but
with reversed kwwl orientation on all branches. If the
original process has,! probabilities /(probability when in state
i of going to state j), then the reverse process has the same
state probabilities and the transition probabilities given by:
<yU) - g Hii)
t
This is true since this qj(i) is merely the a posteriori probability
for the original process that when in state j the preceding state
was state i. The inverse of Fig. 1 is shown in Fig. 2.
It is interesting to show directly that the entropy
H£ of the reverse process is equal to the entrop4jHp of the
forward process. Of course, this must be true a posteriori from
the general properties of entropy. V/e have
Pjfi'jU) - PifKj)
9 ?
- 2 -
Hence t
ZP^U) log Pjqj(i) - ZPifi(j) log Pl^i(j)
or
2Pjqj(i) log qj(r) ♦ 2Pjqj(i) log ?±
- ZtjfiU) log ♦ ZPij^itj) log Pi
Iff
Hence:
-HR + ZPj log Pj —Hp ♦ ZPi log Pi
C. E« SHANNON
1
Outline of Talk
American Statistical Society, December 28, 1949
INFORMATION THEORY
by
C. S. Shannon
Bell Telephone Laboratories, Inc., Murray Hill, R. J.
1, Information Produced by a Stochastic Process
In communication engineering , we are interested in
transmitting messages from one point to another. The messages
generally consist of a sequence of individual symbols, such as
the letters of printed English, which are governed by proba-
bilities. Thus, in English, there are the various letter fre-
quencies, digram frequencies, etc. The "meaning* of the
message (if any) is irrelevant to the engineering problem.
Abstractly, then, we may consider a message to be a sequence of
meaningless symbols produced by a suitable Stochastic process.
Communication systems must be designed to handle the ensemble
of possible messages; the particular one which will actually
occur is not known when the system is constructed. The source
producing messages is assumed to have only a finite number of
possible internal states.
2. Entropy as a Measure of -Information
A suitable measure of the amount of Information pro-
duced by a discrete Stochastic process is given by the entropy
H, where
Ha- Um hi p^, lo*2 **xl» ••"»
■ ™e> ^S» sw
- 2 -
in which x^, • Xjj is & sequence of N symbols produced by
the process, p(x^f •*#, x^) is the probability of this ssquence,
and the sum is over all sequences of this length.
The significance of the quantity H is that it is pos-
sible to translate messages from a source with entropy H into a
sequence of binary digits (0 or 1) using, on the average, H + c
binary digits per letter of the original message with any
positive c. It is not possible to translate so that fewer are
used* Thus. B measures, in a sense, the equivalent number of
binary digits per letter of message. It can be shown that H
also determines the amount ef channel capacity required for
transmission of the original messages.
entropy, Hx(y) , of one source relative to another. This
measures in a sense the uncertainty per letter of the y sequence
when the x sequence is known, or ths amount of additional infor-
mation in the y sequence over that available in the x sequence.
Hx(y) can be defined as follows:
Hjty) « H(x, y) - H(x)
where H(x, y) is the entropy of the sequence whose elements are
ths ordered pairs (x, y) •
3. The Nature of Information
While the entropy H measures the amount of information
produced by a Stochastic process, it does not define the infor-
mation itself. Thus two entirely difference sources might
produce information at the same rata (same H) but certainly they
are not producing the same information. If we translate the
output of a particular source into a different "language" by a
reversible operation, the translation may be said to have the
same information as the original. Thus we are led to consider
the information of a Stochastic process as that which is common
to all translations obtained from the given process by members
of the group 0 of reversible translations, or, alternatively, as
the equivalence class of all processes obtains* from the given
one by such translations. To avoid certain paradoxical situa-
tions, involving infinite internal storage in the transducer
doing the translating, it is desirable to first limit the group
Q to translations possible in transducers having a finite
number of possible internal states. The information associated
with a process may bs denoted by a single letter, say X. Thus
X = T means that T can be obtained by a translation of I, and
conversely. It is possible to set up a metric satisfying the
usual postulates as follows:
* 2H(x, y) - *(x) - H(y) .
Vith this metric It Is possible to define limiting sequences of
elements, each of which is an information. Thus s Cauchy
sequence, XjL> Xj, i« defined by requiring that
Lim ptX,, In) « 0 .
The Introduction of these sequences as new elements (analogous
to irrational numb ere) completes the space in a satisfactory
way and enables one to simplify the statement of various results.
k. The Information Lattice
A relation of inclusion, x > y, between two infor-
mation elements x and y can be defined by
x > 7 * Hx(y) ■ 0 .
This essentially requires that y can be obtained by a suitable
finite state operation (or limit of such operations) on x. If
x > y we call y an abstraction of x. If x > y, y > s, then
x > s. If x > y, then H(x) > H(y). Also x > y means x > y,
x f y. The information element, one of whose translations is
the process which always produces the same symbol, is the 0
element, and x > 0 for any x.
The sum of two Information elements, s m x + y, is the
process which produces the ordered pairs (x^, yn). We have
and there is no u < s with the properties; a is the least upper
bound of x and y.
The product s » xy is defined as the largest t such
that • > x, s > yj that is, there is no u > s haying both x
and y as abstractions. The product is unique.
With these definition* information element e fona a
metric lattice. The lattice it not distributive, nor even
modular. A non-distributive example 1b x, y independent
sequences of binary digits, with z the sequence obtained by-
mod 2 addition of corresponding symbols in x and y. Then
sy + 2x = 0 + 0 = 0
i(x + y) ■ i / 0 .
The lattices are relatively complimented. There
exists for x < y a ■ with
s + x = y
sx =* 0 .
The element s is not, in general, unique.
5. The Delay Free Group 0^
The definition of equality for information based on
the group 0 allows x = y when y is, for example, s delayed
version of x$ yB ■ x^. In some situations, when one must
act on information at a certain time, a delay is not permis-
sible. In such a case we may consider the more restricted
group of instantaneously reversible translations. One may
define inclusion, sum, product, etc., in an analogous way, and
this also leads to a lattice but of mush greater complexity
and with many different Invariants.
Proof of an Integration Formula
C. E. Shannon
The integral
0 sin2 x 2 sin^ or
has arisen in an acoustical problem. It has been evaluated for N = 1, 2, 3, 4 as
equal to
gN (a) = a N + 2 i— r-1 sin 2 i a (2)
(-1 '
by R. C. Jones, and he has conjectured that fN = gN for all a, Af. A general
proof follows.
From (1) we have
. , . , „, . 1 f ° cos lNx-2 cos 2(W - 1)* + cos 2W - 2) x .
A2*, -h ~ Tfn-1 + In -2 = ~ y J0 L^T^ ^
and
d a2 , , , cos 2Ate - 2 cos 2flV - l)a + cos2(A^ - 2)a
— AW»(«) y^ (3)
Also from (2)
Aiv = a + 2
(-1 '
2 _ sin 2(AT - 1) a
AN. AT ftV(a) N~^\
tit.N gsw = 2 cos 2(N - 1) a (4)
The equality of (3) and (4) can be established by noting that the numerator of (3),
-2-
Hence
cos 2 N a - 2 cos 2(N - l)a + cos 2(N - 2)a
Re [eJV,a - 2eJ2{N~l)a + e/W-2)aj
Re
^-i)a[c,2a_2 + c-,2a]J
= Re |«W-D« (2;-)2
2j
- - Re |4 sin2 a ^W-1)*) = - 4 sin2 a cos 2(N - l)a
but A2 (0) = A2 fN (0) = 0, so that
^2n,n8nM = Ai^/jvCot)
also it has been verified that
Si (°0 = /i(a)
£2 (°0 = /2(a)
Hence it follows in general that
A &leit*l ****** »t fr^Mlttltac lafonttttoa
2t Is p*«*lM* fey ¥fe*l*u# of eodulaUoe to Xmr
pjroto oao tutpmt of e oystos for *jr&»o*iUia£ Iafor»*Uoa at too
OXpoooo Of otters. Mi« T*risro« car.atmeo *tic* mj se exoasuigfg
i, uaitty of rocoivo* oigoel, ftiiica ess bo rou^iJ/
SMMMHtrwS la *««HM» t>/ S&0 tO £13 1 00
-
ratio*
£• TtttiiBZi 2 1%9? yc**r»p.
S. tlm of troossUooi£A»
ft. BoiOO 4*4 t&O OJKfeOtt*
aoooroX tteojr* of bow tfeooo voriofcioo oro roiotoO «*4 tSm
liivwi»«d oafi will oe &«volopo4 la a forthoofclas soaorwifim.
Bo»oo«r «poofcitt& x-.Ht*M/ *&4 oa&or « sus&ber of oojJUioay 0001*09- -
f ol2ooXm« e^ufitioos
a ■ f if y 10 {*)
3 * « aooouro Of 4ii*t0rtiGji at tftt **««tv*r
t * *f trooonlooiaa
* • bsaa iriiia ©f tro-ts&ittor
ST * aciso j-«w«T £*30|t?fl ti:«t 1» t&O O&iOO ?OW*r
p#r *Ait tw?.i4 oil Hi, *>*«&r*e» tolas
alalia *s flfci is toe rofii«» u^At-? *fi>.:mlaar*tioa
yjUUi ftmi tautt koojMtag rooolToft <|ooli*jr istojr&ottt
oo aor 0100010 t, F «M £ 1a r*rio*»o o*> loo* ft* oo
kooo tl*o gpam ©f t&« foooHoo*
r 1 21
«fcoro £«* an£ % or« too WUl triuioatttor tatar ao4 acl«o
QJQjSgf, **ria« too traaftftlsalast tiao. ^» fcr •sa«pl« t/jr to-
oroosiog btutf wUto oo ooo eoorofioo tra&o&ittor - tU«
m&a&m&t 10 la «a« ooaoo vor* foooroolo »iae* It lit « log-
aritt.ai« *moj o**lag aulto or boaA oJUitfc AlvMoo t&o o*or«r
»jf a ft* tor.
»ro two »*tbfld« of fetter Sag o1&ao1 *» aaloo rotlo «t too ox»«ooo
of boo* «i*to. BoltOor of titooo Jkwovo* Is by oor msw* eftUud
l& too ozobooso. Sfco $roooal aoKomotoa toooriooo o sow ootfaoo
at its t&Uft oosootlollr too aoxtwai e*oias of olgool
pmm* io oofelovoi for o $lm oo** wlata laero*oo* &U 4coo
not «oo£ toot «t« ftfotoa of troaoaiooieo lo • tooorotioaHf
Uool ono for tkoro oro oororol otHor aooo* of iss$*miM* ro-
ooivoi qooJLU* fcooola* f . *. ? *o& * flxoi - «**t tfclo oro too
to to yWlt m ooarlr tAool oireonago roto ootooo* too
anlM 1m Oaa^L fift Um of OOOlloo fcfa* YOl&OC
of too lopot ytoolotlag fomoUoa (too o$oooa faootloo la tolo-
saoao oaa roftle) ot o 00300000 of rofolorXr ooboo* oooyllat
Thus t«8 + 4~£**l ,
Oi *5 --« 4-4-2 + 1
A tnaaltttr for this ay* taa oould built 1m the
following way. A oondenaar ia okarged as usual to tha eamplad
roltage. fill roltaga la read on a comparator teiaaed up to
■
half the *w<""t If the comparator glrea a poaitlra Indlcatioa
am electronic switch la oloaad feeding a aegatire pulaa of 2*
uuita oT charga late tha condenser; If not a poaitlra pulaa of
2m unita is fad in. Tha oomparator is now switched to control
' -
at now pulaa source whieh preduaas pulaaa of 2n**1 units and tha
prooaaa is repeated. Thus tha circuit f aods in positire or
nogatlTO pulaaa of decreasing magnituda "hunting* for a balance.
At oaoh stags a rooordar remembers whathor a poaitlra or negatire
pulaa was used. Thass positire ant nagatira recordings actually
arc tha Binary roprasantation of tha original roltaga, as ona
can soo »y roading tha shore table with 1» roplaaod by 0. Baneo
tha raoolror of Jig, 4 can ho used without alteration in this
system*
- £723
Creative Thinking
f
Up to 100% of the amount of ideas produced, useful good
ideas produced by these signals, these are supposed to be arranged
in order of increasing ability. At producing ideas, we find a
curve something like this. Consider the number of curves produced
here - going up to enormous height here,
A very small percentage of the population produces the
greatest proportion of the important ideas. This is akin to an
idea presented by an English mathematician, Turig, that the human
brain is something like a piece of uranium. The human brain, if
it is below the critical lap and you shoot one neutron into it,
additional more would be produced by impact. It leads to an ex-
tremely explosive • of the issue, increase the size of
the uranium. Turig says this is something like ideas in the human
brain. There are some people if you shoot one idea into the brain,
* you will get a half an idea out. There are other people who are
beyond this point at which they produce two ideas for each idea
sent in. Those are the people beyond the knee of the curve. I
don't want to sound egotistical here, I don't think that I am
beyond the knee of this curve and I don't know anyone who is. I
do know some peopie that were. I think, for example, that anyone
will agree that Isaac Newton would be well on the top of this
curve. When you think that at the age of 25 he had produced enough
■
science, physics and mathematics to make 10 or 20 men famous - he
produced binomial theorem, differential and integral calculus, laws
of gravitation, laws of motion, decomposition of white light, and
so on. Now what is it that shoots one up to this
- 2 -
part of the curve? What are the basic requirements? I think we
could set down three things that are fairly necessary for scien-
tific research or for any sort of inventing or mathematics or
physics or anything along that line. I don't think a person can
get along without any one of these three.
The first one is obvious - training and experience,
lou don't expect a lawyer, however bright he may be, to give you
a new theory of physics these days or mathematics or engineering.
The second thing is a certain amount of intelligence or
you have
talent. In other words, /to have an IQ that is fairly high to do
good research work. I don't think that there is any good engineer
or scientist that can get along on an IQ of 100, which is the
average for human beings. In other words, he has to have an IQ
higher than that. Everyone in this room is considerably above
that. This, we might say, is a matter of environment; intelligence
ie a matter of heredity.
Those two I don't think are sufficient. I think there is
a third constituent here, a third component which is the one that
makes an Einstein or an Isaac Newton. For want of a better word,
we will call it motivation. In other words, you have to have some
kind of a drive, some kind of a desire to find out the answer, a
desire to find out what makes things tick. If you don't have that,
you may have all the training and intelligence in the world, you
don't have questions and you won't just find answers. This is a
hard thing to put your finger on. It is a matter of temperament
3 -
probably; that is, a matter of probably early training, early child-
hood experiences, whether you will motivate in the direction of scien-
tific research. I think that at a superficial level, it is blended
use of several things. This is not any attempt at a deep analysis at
all, but my feeling is that a good scientist has a great deal of what
we can call curiosity. I won't go any deeper into it than that. He
wants to know the answers. He's just curious how things tick and he
he
wants to know the answers to questions; and if/sees things, he wants
to raise questions and he wants to know the answers to those 0
Then there's the idea of dissatisfaction. By this I don't
mean a pessimistic dissatisfaction of the world - we don't like the
way things are - I mean a constructive dissatisfaction. The idea
could be expressed in the words, "This is OK, but I think things could
be done better. I think there is a neater way to do this. I think
things could be improved a little. w In other words, there is con-
tinually a slight irritation when things don't look quite right} and
I think that dissatisfaction in present days is a key driving force
in good scientists.
And another thing I'd put down here is the pleasure in see-
ing net results or methods of arriving at results needed, designs of
engineers, equipment, and so on. I get a big bang myself out of proving
a theorem. If I've been trying to prove a mathematical theorem for
a week or so and I finally find the solution, I get a big bang out of
it. And I get a big kick out of seeing a clever way of doing some
engineering problem, a clever design for a circuit which uses a very
small amount of equipment and gets apparently a great deal of result
out of it. I think so far as motivation is concerned, it is maybe a
little like Fats Waller said about swing music - either you got it or
ii
you ain't. If you ain't got it, you probably shouldn't be doing re-
search work if you don'